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ABSTRACT 


» 


Since  VHDL  is  a  DoD  standard  hardware  description 
language,  it  is  widely  used  in  the  design  of  logic  circuits  at 
different  levels.  VHDL  can  be  used  to  do  benavioral  modeling 
which  is  desirable  in  top-down  system  design.  A  costfunction 
calculation  in  a  graph  partition  algorithm  is  used  here  as  an 
example  to  test  the  VHDL  design  methodology.  Subroutines  or 
statements  in  the  software  can  be  implemented  into  hardware  if 
the  subroutines  or  the  statements  in  that  software  are 
suitably  grouped.  While  the  design  of  hardware  is  considered, 
high  density  integration  of  circuits  is  also  the  primary  goal. 
Parts  of  an  old  design  were  condensed  using  programmable  EPLDs 
which  were  programm.ed  by  commercial  software  development 
tools.  The  methodology  of  implementation  goes  from  a  register 
transfer  language  description  to  data  flow  design  and  control 
flow  design.  The  costfunction  calculation  was  successfully  put 
into  4  EP1800  chips  and  the  design  was  simulated  in  VHDL.  The 
primary  goal  of  integration  was  achieved  at  the  expense  of 
speed.  To  support  the  total  simulation  several  behavior  models 
were  created.  Results  of  the  simulation  revealed  that  the 


adder  circuit  in  the  EP1800  can  be  further  improved. 
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I .  INTRODUCTION 

A.  INTRODUCTION 

There  is  strong  interest  of  being  able  to  detect  signal 
tracks  in  a  lofargram  in  a  noisy  environment.  A  lofargram  is 
a  two  dimensional  display  of  power  spectrum  with  respect  to 
the  time  axis  and  the  frequency  axis  of  acoustic  sources.  A 
marine  vessel  with  man-made  noise  will  show  up  as  tracks  in 
lofargram.  One  method  of  detection  in  a  noisy  lofargram  is  to 
transform  this  signal  detection  problem  to  a  graph  partition 
problem.  Each  pixel  of  the  lofargram  corresponds  to  the  node 
of  a  graph.  The  horizontal  chaining  of  the  pixel  along  the 
time  frame  corresponds  to  the  edges  of  the  graph.  The 
constraints  of  track  positions  are  correlated  to  the  graph 
precedence  associated  with  the  edges.  The  problem  of  finding 
maximum  signal-to-noise  ratio  tracks  in  a  lofargram  becomes  a 
problem  of  partitioning  the  graph  which  can  result  in  minimum 
cost  [Ref.  1].  The  costfunction  is  defined  as: 

where  rj.  is  the  averaged  signal  power  along  track  i. 

Oj  j  is  the  averaged  noise  between  track  i  and  track  j . 

Y  is  a  threshold  constant  that  determines  the  false 
alarm  rate. 
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Cfj  ,  is  the  incremental  cost  of  partitioning  the  graph  with 
the  new  track  j  given  that  the  last  partition  of  track  j  is 
already  accomplished.  The  original  objective  of  achieving 
maximal  signal-to-noise  ratio  is  changed  to  achieving  minimum 
cost  of  Cf .  ; .  When  track  i  is  located  at  the  right  cut  of  the 
graph,  the  cost  of  Cf,.  .  will  be  minimized.  The  graph 
partitioning  algorithm  by  Jensen  [Ref.  2]  enumerates 
all  possible  partitions  of  a  graph  and  evaluates  them 
efficiently  using  a  derived  tree  and  the  costfunction.  All 
possible  partitions  of  a  graph  are  enumerated  in  the  structure 
of  a  tree  which  is  generated  sequentially  using  an  algorithm 
described  in  detail  by  Jensen.  Moreover,  the  solution  of  the 
partitioning  algorithm  is  optimal  with  respect  to  the 
costfunction  used.  The  problem  of  partitioning  was  formulated 
as  a  dynamic  programming  problem  which  is  an  equivalent  search 
of  the  tree.  The  tree  generated  by  this  algorithm  contains  all 
the  information  required  to  locate  the  optimum  partition  of 
the;  grapn.  This  is  a  dynamic  programming  approach  to  find  a 
global  optimum  solution. 

An  additive  step  costfunction  for  a  set  of  nodes  is 
defined  above  so  that  the  total  cost  is  a  sum  of  the 
individual  step  costs.  In  effect,  all  possible  pairs  of 
partitions  are  evaluated  using  an  additive  costfunction.  The 
aim  is  to  determine  the  minimal  cost  over  the  space  of 
partitions  of  the  graph.  At  the  end,  the  total  cost  of  a 
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solution  is  given  by  the  sum  of  the  cost  of  all  the  individual 
partitions . 

The  calculation  of  the  step  costfunction  occurs  frequently 
in  the  algorithm  and  this  calculation  consumes  a  major  portion 
of  the  execution  time.  Wu,  et  al ,  of  the  Naval  Research 
Laboratory  have  designed  a  hardware  board  to  calculate  the 
step  costfunction.  This  hardware  design  is  called  "PARTITION". 

The  contribution  of  this  research  is  the  development  of  a 
means  to  use  the  VHDL  hardware  description  language  to  model 
and  simulate  the  functions  of  the  existing  circuit.  The 
circuit  to  be  studied  performs  the  mathematical  calculation  of 
the  COSTFUNCTION  which  is  a  subroutine  in  the  graph 
partitioning  algorithm.  The  COSTFUNCTION  calculation  can  be 
performed  in  either  a  C  language  program  or  an  implemented 
logic  circuit.  The  COSTFUNCTION  circuit  was  separated  from  a 
bigger  graph  partitioning  algorithm  written  in  C  and 
implemented  in  TTL,  multiplier  chip,  and  PAL  chips.  The 
hardware  of  the  graph  partition  algorithm  is  called  the 
PARTITION  circuit.  Parts  of  the  circuit  will  be  implemented  in 
condensed  form  using  several  Erasable  Programmable  Logic 
Devices  (EPLD) .  In  particular,  the  high  density  (2000  gates) 
EPLD;  i.e.  EP1800  is  used  for  this  purpose. 

B.  VHDL  HARDWARE  DESCRIPTION  LANGUAGE 

VHDL  stands  for  VHSIC  Hardware  Description  Language.  It  is 
a  new  hardware  description  language  developed  and  standardized 
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by  the  U.S.  Department  of  Defense  for  documentation  and 
specification  of  CAD  microelectronics  design  [Ref.  3]. 
The  language  was  developed  to  address  a  number  of  recurrent 
problems  in  the  design  cycles,  exchange  of  design  information 
and  documentation  of  digital  hardware.  VHDL  is  technology 
independent  and  is  not  tied  to  a  particular  simulator  or  logic 
value  set.  Also  it  does  not  force  a  design  methodology  on  a 
designer  [Ref.  4].  Many  existing  hardware  description 
languages  can  operate  at  the  logic  and  gate  level. 
Consequently,  they  are  a  low-level  logic  design  simulators. 
While  VHDL  is  perfectly  suited  to  this  level  of  description, 
it  extends  beyond  this  to  higher  behavioral  levels. 

The  study  here  using  VHDL  to  describe  the  costfunction 
logic  circuit  revealed  both  the  advantages  and  the 
disadvantages  of  the  simulated  circuit  as  well  as  the 
limitations  of  the  procedure  of  density  integration  using 
EP1800. 

C.  ALTERA  ERASABLE  PROGRAMMABLE  LOGIC  DEVICES 

ALTERA  development  tools  are  available  to  program  the 
Erasable  Programmable  Logic  Device  (EPLD)  [Ref.  5].  An 
EPLD  is  a  combination  of  CMOS  devices  and  EPROM  devices.  The 
family  of  EPLDs  spans  the  range  of  density  from  300  to  over 
2  000  gates.  The  ALTERA  CAD  tool,  A+PLUS,  is  used  in  this 
research.  As  shown  in  Figure  1,  the  package  allows  mixed 
format  design  entries;  Boolean  equations,  state  machine. 
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Figure  1.  A+PLUS  Block  Diagram 

(adapted  from  ALTERA  data  book) 
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netlist,  and  schematic  capture.  "Design  Processing"  performs 
logic  minimization,  automatic  device  fitting,  utilization 
reporting,  and  creation  of  standard  JEDEC  programming  files. 
Device  fitting  is  the  PLD  equivalent  to  an  automatic  place  and 
route  capability.  The  ALTERA  EPLD  development  tools  are 
installed  on  an  IBM  AT  computer  for  this  research. 

By  using  the  ALTERA  development  tools,  part  of  the 
PARTITION  schematics  design  are  translated  into  JEDEC  files. 
These  JEDEC  files  will  be  read  into  the  VHDL  structural  model 
of  the  EP1800  to  perform  the  programmed  EPLD  function. 

D.  COSTFUNCTION 

Sonar  lofargrams  tend  to  be  noisy  and  of  low  contrast.  The 
feature  of  interests  is  a  line  on  a  track.  The  track  detection 
problem  is  one  of  translating  the  image  processing  problem 
into  a  graph  such  that  the  nodes  of  the  graph  correspond  to 
the  pixels  of  the  lofargram.  Any  cut  through  the  graph 
generates  a  track  at  a  length  as  long  as  the  number  of  time 
lines  in  the  graph. 

The  graph  weights  are  derived  from  some  pre-defined 
measure  such  as  display  pixel  intensity  value,  which  is  an 
analogue  of  signal  strength.  Target  tracks  are  manifested  as 
cuts  through  this  graph.  The  optimal  partitioning  of  the  graph 
can  be  based  upon  some  objective  criterion  such  as  the  signal- 
to-noise  ratio. 
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The  costfunction  is  based  roughly  on  signal-to-noise 
ratio: 

Cfi.j=YOi,j-'ni 

This  is  a  weighted  noise  estimate  less  the  signal 
estimate.  The  track  t^  is  modelled  as  being  one  pixel  wide. 
The  signal  estimate  rj  is  obtained  by  integrating  the  graph 
weights  along  the  track  path  tj.  The  noise  estimate  j  is 
determined  from  the  mean  weight  of  the  nodes  between  t^  and  t ^ . 
The  scaling  constant  y  can  be  varied  to  change  the  detection 
threshold. 

The  costfunction  formula  above  can  be  mapped  directly  as 


costfunction (i 


_  table  [i]  -table  [j]  a^cril 
size[i] -size[j] 


if  Y  =  1.  The  array  table []  stores  the  accumulated  pixel 
intensity  to  the  current  track  position  and  array  size[] 
stores  the  accumulated  number  of  nodes  to  the  current  track. 
The  signal  estimate  for  each  track  path  is  stored  in  array 
acc[].  This  formula  was  written  in  a  C  language  program  as 
folic, s : 

int  costfunction ( i ,  j  ) 

VERTEX  i , j ; 

{ 

register  int  icost; 

icost  =  (int)  (table[i]  -  table[j]); 
icost  /=  (int)  (size[i]  -  size[j]); 
return  (icost  -  (int)  acc[i]); 


In  order  to  obtain  the  faster  speed  operation  of  this 
subroutine,  the  above  software  is  implemented  in  hardware.  A 
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circuit  of  the  costf unction,  as  shown  in  Figure  2,  consists  of 
TTL  ICs,  PROMs,  RAMs  and  PAL  devices.  Most  PAL  devices  produce 
control  signals  to  manipulate  the  operations  of  the  circuit  in 
coordinated  complex  timing  sequences. 

1.  RTL  description  of  the  operations  on  PCS 

The  Register  Transfer  Language  (RTL)  description  of 
the  costfunction  circuit  consists  of  10  groups  of  register 
transfers  as  shown  in  Figure  3.  A  group  represented  by 
vertically  linked  indicates  the  concurrent  transfer.  When  a 
transfer  is  made  into  memory,  the  Memory  Address  Register 
(MAR)  is  used.  The  transfer  from  the  location  in  memory 
identifed  by  the  memory  address  register  is  specified  with 
square  brackets.  Each  buffer  register  is  associated  with  a 
name  REG.  These  registers  usually  consist  of  the  D  type 
flip-flops . 

2.  Operations  of  the  Costfunction  Circuit 

Initially,  when  a  main  program  is  executed  (in  IBM  PC 
or  compatible)  the  computer  communicates  directly  to  the 
costfunction  Printed  Circuit  Board  (PCB)  in  order  to  load 
values  of  array  ACC[i]  and  array  TABLE [i]  into  ACC  RAM 
(IC37,35)  and  TABLE  RAM  (IC36,34)  respectively  in  Figure  2. 
The  loading  is  performed  in  IC40  by  decoding  the  PC  address 
bus  into  control  signals.  These  control  signals  are 
interpreted  by  IC08  to  generate  asynchronous  control 
sequences.  After  the  control  signals  are  established,  the  data 
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Figure  2.  Costfunction  Circuit 
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Figure  2(con't).  Costfunction  Circuit 
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Figure  3.  RTL  of  the  original  Costfunction  circuit 


is  transferred  from  the  PC  data  bus  into  the  ACC  RAM  and  the 
TABLE  RAM,  respectively,  via  IC33  and  IC39.  After  loading  is 
completed,  the  costfunction  PCB  is  ready  to  perform  the 
costfunc*'.ion  subroutine  as  discussed  above. 

The  basic  operations  of  the  costfunction  circuit  can 
be  described  in  RTL  as  shown  in  Figure  3. 
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When  the  subroutine  is  called,  value  i  and  j  are 
passed  simultaneously  through  registers  IC02,04  and  IC11,19 
and  latched  into  IC01,03  and  IC10,18  respectively.  This  is 
shown  as  the  first  RTL  group  in  Figure  3.  Then  the  value  ot  i 
is  latched  through  IC01,03  in  order  to  go  to  the  ACC  RAM 
(IC35,37),  the  TABLE  RAM  (IC36,34),  and  the  SIZE  PROM 
{IC26,27)  as  an  address  shown  in  the  second  group  of  RTL  in 
Figure  3.  The  ACC  RAM  and  the  SIZE  PROM  are  enabled  first. 
After  ACC[i]  is  accessed  and  the  value  is  passed  from  RAM  via 
adder  circuit  into  the  Y  register  and  the  accumulator  of  the 
AM29510,  the  TABLE  RAM  is  enabled  to  access  the  value  at 
TABLE[i]  location.  The  outputs  from  memory  devices  are  passed 
through  IC32,38  and  IC20,13  without  negation  into  the  adder 
circuits.  After  the  registers  of  the  adder  is  latched,  the 
output  of  the  adder  registers  are  TABLE[i]  and  SIZE[i].  This 
is  described  in  the  fifth  group  of  RTL  in  Figure  3. 

Next,  the  value  of  j  is  passed  in  the  same  way  as  i  to 
access  TABLE[j]  and  SIZE[j]  as  shown  in  the  sixth  group  and 
the  seventh  group  of  the  RTL  description  in  Figure  3.  Then  the 
outputs  from  both  memories  are  negated  by  the  inverter  PAL 
(IC32,38  and  IC20,13)  and  passed  into  the  adder  circuits.  This 
time  each  adder  performs  two's  complement  addition  and  the 
results  after  the  latching  are  the  value  of  (TABLE[i]- 
TABLE[j])  and  (SIZE [ i ] -SIZE [ j ] )  in  IC23,24  and  IC15,22. 

The  value  of  (SIZE[ i]-SIZE[ j ] )  is  used  to  access  the 
PROM  (IC16,17)  to  get  the  inverted  value.  Then  the  output  from 
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the  PROM  is  gated  into  the  X  register  of  the  multiplier  (IC25) 
by  signal  CLKX.  At  the  same  time,  the  value  of  (TABLE[i]- 
TABLE[j])  is  also  gated  into  the  Y  register  by  signal  CLKY. 
This  is  shown  as  the  ninth  group  of  the  RTL  description  in 
Figure  3.  The  result  of  the  multiplication  is  subtracted  from 
the  previous  value  in  the  accumulator  (ACC[i]) ,  and  the  result 
is  stored  back  into  the  accumulator. 

After  the  CLKP  signal,  the  result  of  the  costfunction 
is  available  from  AM29510  at  the  P  register  together  with  the 
CF_DONE  signal  and  CF_V  signal  from  IC07  and  IC08, 
respectively . 

E.  OVERVIEW  OF  THE  THESIS 

This  thesis  is  divided  into  five  chapters.  Chapter  I  gives 
the  introduction  of  the  costfunction  circuit,  VHDL,  and  ALTERA 
softv/are  package.  Discussion  of  how  the  partition  of  the 
costfunction  circuit  was  made  is  discussed  in  Chapter  II. 
Chapter  III  includes  VHDL  behavior  modeling  of  some  specific 
ICs  in  the  circuit.  Simulation  of  the  total  costfunction 
circuit  and  results  are  discussed  in  Chapter  IV  and  the 
improvement  of  the  design  is  also  included.  Finally,  Chapter 
V  gives  the  conclusions  and  the  suggestions  of  possible  future 
research . 
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II. 


FUNCTIONS  OF  THE  PARTITION  CIRCUIT  DESIGN 


A.  C  PROGRAM  TO  RTL  AND  SCHEMATIC  DIAGR2^ 

The  costfunction  calculates  the  signal  to  noise  ratio  of 
each  track  partition  between  t;  and  tj  and  gives  the  result 
back  to  the  graph  partition  algorithm.  The  costfunction 
subroutine  is  written  in  C  language.  This  subroutine  is  called 
from  the  program  PARTITION  which  performs  the  graph  partition 
algorithm.  The  program  is  shown  in  Figure  4. 

In  order  to  implement  this  subroutine  in  hardware,  each 
statement  in  the  program  must  be  considered  carefully. 
Normally,  the  internal  hardware  activities  are  consisted  of 
both  the  control  flow  and  the  data  flow.  The  control  flow  can 
be  implemented  easily  if  the  data  flow  is  well  described  in 
the  RTL.  The  RTL  shows  sequences  of  operations  of  each 
functional  modules  at  different  time  [Ref.  6] .  A 
simple  block  diagram  of  the  costfunction  hardware  is  shown  in 
Figure  5.  This  block  diagram  is  more  abstract  than  the 
original  circuit  shown  in  Figure  2.  It  shows  how  the  EPLDs  are 
used  in  the  implementation.  Data  paths  are  implemented 
according  to  this  partition.  The  modules  of  the  block  diagram, 
their  names,  and  their  function  are  as  follows: 

•  CHIP4i:  This  is  an  input  buffer  for  value  i. 

•  CHIP4  j  :  This  is  an  •' nput  buffer  fo'^  value  j. 
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^include  "part.h" 
jinclude  <sT:dic.h> 

void  partition  (  r.odecount  ) 
int  nodecount ; 

{ 

register  int  i,  j,  c; 

BOOLEAr;  status; 

int  cost (tlAXNODE  j  ; 

*  f p ,  *fopen ( ) ; 
rinval ; 

int  costfunction ( ) ; 

for  (  i  =  C  ;  i<nodecount ;  i+-^) 

I 

ccsv;i:  =  BIGINT; 

ort '  i  j  =r.u licet.; 

) 

1 ;  i<r.ndecount ;  i*+) 

for(  j=0;  j<i;  j++  ) 

( 

status=TRUE; 
for (c=0;c< lines; C++) 
if (setsi2e[ i ) (c)<=setsizel[ j ] [c] ) 

status=FALSE ; 
break ; 


if  (status) 

c  =  costfunction (  i,  j  '  +  cost[j] 

if(  (cost[i])  >=  c  ) 

( 

ccst[i]  =  c; 
opt[i]  =  j; 


) 

) 

int  costfunction (  i,  j  )  /*  cost  of  segment  */ 

VERTEX  i ,  j  ; 

( 

register  int  icost; 

icost  =  (int)  {table[i] -table [ j ] ) *6 ; 
icost  /=  (int)  (SI ze ; i ] -size [ j ] )  ; 

return(  icost  -  (int)  acc[i]  ); 

) 


Figure  4.  Program  PARTIT.C 


FILE 

int 
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ACC  RAM  memory:  This  memory  is  used  to  store  values  of  the 
array  ACC- 


Figure  5.  Simple  block  diagram  of  the  costfunction 


•  TABLE  RAM  memory:  This  memory  is  used  to  store  values  of 
the  array  TABLE 

•  SIZE  PROM  memory:  This  memory  contains  the  constant  values 
of  array  SIZE.  The  memory  chips  chosen  here  are 
Programmable  Read  Only  Memory  (PROM) . 

•  CHIP3:  This  is  an  EP1800  device  which  is  programmed  to 

pass  the  value  of  ACC[i]  and  also  perform  TABLE [i]  - 

TABLE[j]  operation.  There  is  an  internal  output  register. 

•  CHIPl:  This  is  an  EP1800  device  which  is  programmed  to 
perform  SIZE[i]  -  SIZE[j]  operation.  This  chip  also  has 
an  internal  output  register. 

•  1/SIZE  PROM  memory.  This  is  a  PROM  memory  which  contains 
the  inversion  of  the  input  value  from  CHIPl. 

•  AM29510:  This  module  has  the  responsibility  for  doing  the 
multiplication  needed  in  the  algorithm.  It  will  multiply, 
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then  add  or  subtract  the  value  to  or  form  the  accumulator. 

There  are  three  registers  internal  to  the  system,  two 

input  registers  (X,Y)  and  one  output  register  (P) . 

Once  the  data  path  is  decided  for  the  basic  algorithm,  the 
RTL  can  be  written  as  shown  in  Figure  6.  Once  the  RTL  is 
completed,  a  state  diagram  associated  with  the  control  flow 
can  be  identified.  It's  obvious  that  this  state  diagram  can  be 
used  to  implement  an  appropriate  controller  directly. 

B.  RTL  DESCRIPTION  FOR  THE  REDESIGNED  COSTFUNCTION  PCB 

The  original  costfunction  circuit  in  Figure  2  is 
previously  described  in  the  block  diagram  of  Figure  5.  This 
diagram  breaks  the  circuit  into  several  blocks.  The  redesigned 
costfunction  circuit  will  implement  some  of  these  blocks  in 
EPLDs. 

The  RTL  in  Figure  6  represents  the  operations  of  the 
redesigned  costfunction  circuit  using  EPLD.  The  RTL 
description  identifies  the  appropriate  calculations  as  well  as 
the  possible  parallelism  of  the  events.  The  system  waits  for 
the  value  of  i  and  j  to  be  loaded  into  the  input  buffer 
registers,  at  this  point  in  time  the  data  processing  begins. 
The  value  of  i  from  CHIP4i  register  is  loaded  as  an  address 
into  the  ACC  RAM,  the  TABLE  RAM,  and  the  SIZE  PROM 
concurrently  as  shown  in  step  two  in  Figure  6.  Then  the  CHIP3 
register  is  loaded  with  ACC[i]  from  the  ACC  RAM  in  step  three. 
The  next  step  is  to  transfer  the  content  of  the  CHIP3  register 
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i 

_ > 

CHIP4i  REG 

j  1 

- > 

CHIP4j_REG 

CHIP4i  REG  - 

- > 

ACC  RAM  MAR 

1 

1 

- > 

SIZE  RAM  MAR 

1 

1 

- > 

TABLE_RAM_MAR 

ACC_RAM[MAR]  - 

— -> 

CHIP3_REG 

CHIP3  REG  - 

- > 

AM29510  Y  REG 

1 

- > 

AM29510  P  REG 

0 

- > 

CHIP3  REG 

0  ! 

- > 

CHIP1_REG 

TABLE  RAM [MAR]  - 

- > 

CHIP3  REG 

SIZE_RAM[MAR]  [ 

- > 

CHIP1_REG 

CHIP4j  REG  - 

- > 

SIZE  RAM  MAR 

1 

I 

- > 

TABLE_RAM_MAR 

CHIP3  REG  -  TABLE  RAM[MAR] 

- >  CHIP3  REG 

CHIP1_REG  -  SIZE_RAM[MAR] 

! - >  CHIP1_REG 

CHIP1_REG  - 

— -> 

1/ S I Z  E_PROM_MAR 

1/SIZE  PROM[MAR]  - 

- > 

AM29510  X  REG 

CHIP3  REG  ! 

- > 

AM29510  Y  REG 

0 

- > 

CHIP3  REG 

0  ! 

- > 

CHIPl  REG 

AM29510  X  REG  *  AM29510 

Y  REG 

-  AM29510 

P  REG - >  AM29510  P  REG 

AM29510 

P  REG  ! - >  OUTPUT 

Figure  6.  RTL  of  the  EPLD  implementation  of  the 
costfunction 


to  the  Y  register  and  simultaneously  clear  the  CHIP3  register 
and  the  CHIPl  register.  The  value  of  Y  register  is  also 
transferred  to  the  P  register  in  step  eonr.  At  the  next  step 
the  CHIP3  register  and  the  CHIPl  register  are  loaded  with 
TABLE[i]  and  SIZE[i]  from  the  TABLE  RAM  and  the  SIZE  PROM 
respectively.  The  next  group  of  transfers,  step  six,  provide 
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the  value  of  j  from  CHIP4j  as  an  address  into  the  TABLE  RAM 
and  the  SIZE  PROM.  Then,  the  values  of  TABLE[j]  and  SIZE[j] 
are  subtracted  from  the  previous  TABLE[i]  and  SIZE[i],  and  the 
result  are  placed  into  their  registers  as  shown  in  step  seven. 
Next,  in  step  eight,  the  value  from  CHIPl  register  is  loaded 
into  the  1/SIZE  PROM  as  an  address.  In  step  nine,  the  value 
from  the  1/SIZE  PROM  is  loaded  into  X  register.  At  the  same 
time,  the  value  from  the  CHIP3  register  is  also  loaded  into 
the  Y  register.  At  this  point,  the  CHIP3  register  and  the 
CHIPl  register  are  also  cleared.  The  final  calculation  is  then 
done,  and  the  result  is  transferred  to  the  P  register  as  well 
as  the  output  port. 

C.  FROM  RTL  DESCRIPTION  TO  STATE  DIAGRAM 

The  state  diagram,  as  shown  in  Figure  7  represents  a 
control  section  that  can  manipulate  the  control  signals  in 
such  a  way  that  the  simultaneity  specified  in  the  RTL  is 
maintained.  The  RTL  description  shows  the  sequence  of  register 
transfer  operations.  In  each  step  of  operation,  some  events 
can  occur  concurrently.  Therefore,  the  control  signals 
generated  in  each  step  of  the  state  machine  can  be  identified 
as  follows: 

•  State  0  accepts  value  of  i  and  j .  The  ENABLE_H  signal  is 
asserted  high  so  that  the  i  is  transferred  to  the  ACC  RAM, 
the  TABLE  RAM,  and  the  SIZE  PROM  as  addresses.  The 
ENABLE_H  signal  is  maintained  until  the  end  of  state  4. 
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Present  state 


Activities 


STATE 

0 

ENABLE_H 

STATE 

1 

ENABLE  H 
WAIT_H 

STATE 

2 

ENABLE  H 
LATCH  H 
CLKX_H 

STATE 

3 

CLEAR  L 
ENABLE  H 
RAM/MAC  H 
CLKY_H 

STATE 

4 

ENABLE  H 
RAM/MAC  H 
WAIT_H 

STATE 

5 

LATCH  H 
RAM/MAC_H 

STATE 

6 

INV  H 
RAM/MAC  H 
WAIT_H 

STATE 

7 

LATCH  H 
INV  H 
RAM/MAC_H 

STATE 

8 

CLEAR  L 
RAM/MAC  H 
CLKX  H 
CLKY_H 

STATE 

9 

CLEAR_L 

STATE 

10 

CLEAR  L 
CLKY  H 

CF  DONE  H 

Figure  7.  State  diagraun  of  costfunction 

•  State  1  is  a  wait  state  which  allows  a  delay  for  the 
memory  to  be  accessed.  The  WAIT_H  signal  is  asserted  from 
the  '“Buried"  state  bit  inside  the  EP1800.  This  bit  is  used 
to  insert  the  wait  states  to  satisfy  the  setup  time 
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requirement  between  the  memory  address  valid  and  the 
accumulator  latches. 

•  State  2  asserts  the  LATCH  signal  which  causes  the  ACC[i] 
to  be  placed  into  the  CHIPS  register.  The  control  status 
register  of  AM29510  are  also  loaded  in  this  state  by 
asserting  the  CLK_X  signal. 

•  State  3  causes  three  things  to  happen.  The  CLK_Y  signal  is 
asserted  to  transfer  the  ACC[i]  from  the  CHIPS  register 
into  the  Y  register  of  AM29510.  The  value  goes  to  the  P 
register  which  is  the  accumulator  of  AM29510  as  well.  The 
CLEAR  signal  is  also  asserted  to  clear  the  registers  of 
CHIPS  and  CHIPl.  At  the  same  time,  the  RAM/MAC  signal  is 
asserted  to  enable  the  accessing  of  the  TABLE  RAM.  This 
signal  also  set  the  two's  complement  input  mode  and 
accumulator  mode  of  the  AM29510.  The  RAM/MAC_H  signal  is 
maintained  until  the  end  of  state  8. 

•  State  4  is  also  a  wait  state  which  allows  the  delay  for 
the  TABLE [i]  and  SIZE[i]  in  RAM  and  PROM  to  be  accessed. 

•  State  5  asserts  the  LATCH  signal  to  place  TABLE [i]  and 
Size[i]  into  the  CHIPS  register  and  the  CHIPl  register. 
The  ENABLE  is  not  asserted  which  causes  the  j  to  be 
addressed  into  the  TABLE  RAM  and  the  SIZE  PROM. 

•  State  6  is  also  a  wait  state  for  memory  accessing.  In  this 
state  the  INV  signal  is  asserted  high  to  get  the  two's 
complement  value  of  -TABLE[j]  and  -SIZE[j]  respectively. 

•  State  7  asserts  the  LATCH  signal  in  order  to  place  the 
TABLE  [i]  -  TABLE[j]  and  the  SIZE[i]  -  SIZE[j]  into  the 
CHIPS  register  and  the  CHIPl  register.  The  value  in  CHIPl 
register  causes  the  accessing  of  the  1/SIZE  PROM  memory. 

•  State  8  causes  two  things  to  happen.  The  CLK_X  and  CLK_Y 
are  asserted  to  load  the  values  from  the  CHIPS  register 
and  the  1/SIZE  PROM  into  the  X  register  and  the  Y 
register.  The  CLEAR  signal  is  also  asserted  to  clear  the 
CHIPS  register  and  the  CHIPl  register  afterwards. 

•  State  9  is  a  wait  state  for  the  internal  calculation 
inside  the  AM29510  multiplier. 

•  State  10  causes  the  result  to  be  filled  into  the  output 
register  by  asserting  the  CLK_Y  signal.  The  CF_DONE  is 
also  asserted  to  indicate  the  valid  value  is  available  at 
the  output  port. 
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D.  BACKGROUND  OF  THE  EP1800  STRUCTURE 


The  EP1800  is  an  erasable,  user-configurable  LSI  device 
that  has  2100  equivalent  gates  logic  [Ref.  7]. 
Externally,  the  EP1800  provides  16  dedicated  inputs,  4  of 
which  may  be  used  as  system  clock  inputs.  There  are  48  I/O 
pins  which  may  be  individually  configured  for  input,  output, 
or  bidirectional  data  flow  as  shown  in  Figure  8.  Internally, 
the  EP1800  architecture  consists  of  a  series  of  macrocells. 
Logics  are  implemented  within  these  cells.  Each  macrocell 
contains  3  basic  elements;  a  logic  array,  a  selectable 
register  element,  and  a  tri-state  I/O  buffer. 

The  EP1800  is  partitioned  into  four  identical  quadrants. 
Each  quadrant  contains  12  macrocells.  Input  signals  into 
macrocells  can  come  from  the  EP1800  internal  bus.  Macrocell 
outputs  may  drive  the  external  pins  as  well  as  the  internal 
buses . 

Sixteen  :if  the  48  macrocells  offer  increased  speed 
performance  through  the  logic  array.  These  "Enhance 
Macrocells"  can  be  used  for  critical  combinatorial  logic  with 
short  delay  paths.  There  are  4  enhanced  macrocells  for  each 
quadrant . 

Another  kind  of  macrocell  provides  dual  functions.  These 
"Global  macrocells"  allow  implementation  of  buried  logic 
functions  and,  at  the  same  time,  serving  as  dedicated  input 
pins.  The  global  macrocells  have  the  same  timing 
characteristics  as  the  general  macrocells. 
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QUADRANT  A  QUADRANT  D 


QUADRANT  B  QUADRANT  C 


r~~l  OENEIUL  MACBOCELLS 

I  I  -  PERTAIN  TO  M  PIN  PGA  PACKAGE  F"!  GLOBAL  MACROCELLS 

o  enhanced  MACAOCELLS 


Figure  8.  EP1800  block  diagram 

(adapted  from  ALTERA  data  sheet) 
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A  structural  model  of  the  EP1800  in  VHDL  is  written  by  a 
previous  thesis  student  [Ref.  8].  This  model  will  be 
used  in  this  research. 

E.  IMPLEMENTATION  OF  THE  FUNCTION  A  =  A  4-  B 

In  order  to  implement  this  algebraic  function  in  one  chip, 
a  full  adder  circuit  and  a  register  must  be  combined  together. 
Since  the  current  VHDL  model  of  the  EP1800  can  only  represent 
D  type  flip-flop  registers,  the  selection  of  D-FF  in  the 
ALTERA  design  tools  is  mandatory.  A  register  is  necessary  to 
store  the  value  of  the  previous  state  (value  of  A) .  Then  this 
stored  value  is  fed  back  to  one  input  of  the  adder  to  be  added 
with  another  input  (value  of  B) .  The  result  of  the  addition 
can  latched  into  the  D-FF  register  after  the  rising  edge  of 
the  clock  signal. 

By  using  Schematic  Entry  in  ALTERA  software,  a  1-bit  full 
adder  is  built  from  the  ALTERA  primitives.  ALTERA  also  has  an 
output  primitive  which  has  both  a  register  and  a  feedback  pin. 
This  is  called  a  RORF  primitive.  The  circuit  of  a  1-bit  full 
adder  with  RORF  is  shown  in  Figure  9.  The  CLR  signal  is  used 
to  reset  the  output  of  the  register  to  zero.  As  an  example, 
the  operation  of  countup  can  be  done  using  this  adder  in  the 
flow  chart  shown  in  Figure  10.  Initially  an  input  of  1  is  set 
to  A.  What  needs  to  be  done  next  is  to  clock  the  circuit 
subsequently. 
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Figure  9. 


1-bit  full  adder  with  register 
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Figure  10.  Simple  countup  flowchart 

This  circuit  of  Figure  9  has  4  input  signals  with 
initialized  value  to  ground  (logic  Low) .  There  are  2  outputs 
SUM  and  CARRY  in  the  circuit.  Every  input  signal  or  output 
signal  is  assigned  a  stub  number  to  facilitate  the  upper  level 
design  in  ALTERA  environment.  This  circuit  of  1-bit  full  adder 
with  register  is  put  into  a  library  of  macrofunctions  for 
later  use  in  upper  hierarchy  designs. 

A  4-bit  full  adder  with  register  can  be  built  by  using  the 
previous  macrofunction  of  l-bit  full  adder.  Each 
macrofunctions  is  connected  sequentially  as  shown  in  Figure 
11.  Between  each  adjacent  macrofunction  there  is  an  ALTERA 
primitive  called  NOCF.  This  primitive  is  used  to  break  the 
product  terms  of  the  carry  out  of  the  1-bit  macrofunction  from 
that  of  the  other  bit  so  that  they  are  implemented  into  two 
different  EP1800  macrocells.  Otherwise,  the  ALTERA  system  will 
give  an  error  message  of  "too  many  p-terms  for  a  single 
macrocell".  Input  stubs  and  output  stubs  also  have  the 


Figure  11.  4-bit  full  adder  with  register 
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initialized  value  of  logic  low.  The  stub  number  are  also  shown 
accordingly.  This  4-bit  circuit  is  also  put  into  the 
macrofunction  library. 

Finally,  a  16-bit  full  adder  with  register  can  be  created 
by  using  the  macrofunctions  of  the  4-bit  adder.  The  same  NOCF 
primitives  are  also  used  to  break  up  the  total  p-term  as 
mention  before.  The  circuit  diagram  is  shown  in  Figure  12.  In 
this  circuit,  the  XOR  gates  are  used  to  complement  the  input 
value.  The  INV  signal  control  this  operation.  Since  the  CLR 
signal  in  the  original  design  is  low  activated,  an  inverter 
primitive  is  used  to  accomplish  the  active  high  requirement. 
A  LATCH  signal  needs  to  have  a  clock  buffer  primitive  because 
this  will  cause  the  ALTERA  Design  Fitter  to  use  a  programmable 
clock  pin.  Otherwise,  an  externally  connected  LATCH  signal  to 
the  clock  pin  of  the  module  will  be  used.  Consequently,  the 
LATCH  input  pin  can  be  reduced  from  4  to  1.  The  clock  buffer 
primitives  alTJO  impose  a  longer  delay  time  between  the  latch 
input  and  the  clock  input  of  the  D-FF.  This  can  ease  the  setup 
time  requirement  for  the  flip  flop  and  ensure  that  the  D-FF 
will  not  be  latched  before  the  arrival  of  the  actual  data. 

F.  IMPLEMENTATION  OF  THE  FUNCTION  TABLE [i]  -  TABLE[j] 

This  function  can  be  performed  in  hardware  by  using  the 
circuit  module  from  the  previous  section  and  additional 
control  signals  from  a  controller.  The  sequence  of  operation 
can  be  as  follows  : 
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Figure  12. 


16-bit  full  adder  with  register  (CHIPS) 
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•  Clear  D-FF  by  the  CLR  signal.  This  will  cause  A  =  0. 

•  Set  B  =  TABLE [i].  After  the  LATCH  signal,  it  will  cause 
A  =  TABLE [ i ] . 

•  Set  B  =  -TABLE  [j].  By  using  the  INV  signal  to  get  the 
two's  complement  of  -TABLE[j] .  The  LATCH  signal  will  cause 
A  =  TABLE [ i ]  -  TABLE [ j ] . 

The  original  circuit  for  this  function  is  shown  in  Figure 
13  which  is  extracted  from  the  original  costfunction  design  in 
Figure  2.  As  mentioned  before,  using  the  macrofunction  of  4- 
bit  full  adder  with  register,  an  equivalent  circuit  is  created 
and  shown  in  Figure  12.  This  circuit  can  be  integrated  into 
one  EP1800  chip  in  the  ALTERA  design  system.  This  EPLD  device 
is  called  CHIP3.  The  pin  assignment  for  this  chip  is  shown  in 
Figure  14. 

The  integration  of  this  circuit  causes  all  data  paths 
between  the  ICs  in  Figure  13  to  be  inside  one  EP1800.  This 
results  in  reduced  space  and  power  consumption.  The 
propagation  time  is  also  reduced.  Normally,  in  the  original 
circuit  the  propagation  time  is  about  150  ns.  The  tailored 
EP1800  chip  can  produce  results  with  a  maximum  propagation 
time  of  75  ns.  The  propagation  time  of  the  CHIP3  really 
depends  on  how  ALTERA  system  assigns  the  logic  expressions  to 
the  macrocells.  Different  implementation  of  the  adder  circuits 
may  result  in  a  different  propagation  time. 

Since  the  gate  delay  of  the  original  design  has  no  effect 
in  the  EPLD  implementation,  the  delay  has  to  be  measured  from 
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Figure  13.  Circuit  of  TABLE [i]  -  TABLE[j] 
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Figure  14.  CHIPS  pin  assignment 


32 


the  EP1800  timing  model.  One  disadvantage  of  using  the  ALTERA 
system  to  configure  the  macrocell  is  that  the  ALTERA  system 
can't  guarantee  the  propagation  delay  of  each  output  pin  to  be 
equal.  The  reason  is  due  to  the  different  timing  of  different 
type  of  macrocells.  For  instance,  the  adder  circuit  for  bit 
YIO  is  assigned  to  a  global  macrocell  while  the  adder  circuit 
for  bit  Y5  is  assigned  to  a  local  macrocell.  This  may  result 
in  different  delay  of  YIO  and  Y5.  However,  in  the  worst  case 
when  the  carry  from  the  least  significant  bit  has  to  propagate 
to  the  most  significant  bit,  the  LATCH  signal  has  to  wait 
until  the  outputs  from  each  adders  become  stable.  After  that 
it  is  finally  possible  to  gate  the  result  into  the  registers. 

The  implementation  of  this  circuit  into  one  EP1800  uses 
about  90  percent  of  the  total  gates  of  the  chip.  The  idle  part 
of  this  EPLD  may  be  used  to  implement  another  circuit  if  the 
remaining  number  of  I/O  pins  and  macrocells  are  adequate. 
However,  this  may  cause  interference  among  the  assigned  pins 
to  the  extend  that  benefit  is  not  usually  worth  of  the 
trouble . 

Signals  used  in  CHIP3  are  the  following  ; 

•  Input  data  signals:  PO  to  P15. 

•  Output  data  signals:  YO  to  Y15. 

•  INV  signal.  This  signal  comes  from  the  control  and  is  used 
to  generate  the  two's  com.plement  of  the  input  data. 

•  CLR  signal.  This  active  low  signal  is  used  to  clear  the 
output  register  to  zero. 
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•  LATCH  signal  is  a  control  signal  for  gating  output  into 
the  register  when  the  output  data  become  available. 

The  data  from  the  ACC  RAM  can  also  be  passed  through  the 
CHIP3  to  the  Y  register  of  the  multiplier.  This  sequence  of 
operations  can  be  shown  as  follows  : 

•  Assert  CLR  signal.  This  causes  Y  =  0. 

•  P  =  ACC[i] . 

•  Assert  LATCH  signal.  This  cause  Y  =  ACC[i]. 

G.  IMPLEMENTATION  OF  THE  FUNCTION  SIZE[i]  -  SIZE[j] 

This  function  is  originally  performed  by  a  circuit  in 
Figure  15.  A  new  circuit  is  created  by  using  the  ALTERA 
primitives  and  macrofunctions  which  is  equivalent  to  the 
original  circuit.  This  circuit  in  Figure  16  can  be  integrated 
into  one  EP1800,  called  CHIPl.  Pin  assignment  of  the 
programmed  EP1800  is  shown  in  Figure  17.  The  control  sequence 
for  this  operation  is  the  same  as  those  of  the  previous 
control  operations.  The  only  difference  is  that  this  function 
has  only  10  bits  input  and  12  bits  output.  That  is  why  the  INV 
signal  is  connected  to  the  2  most  significant  input  bits  of 
the  adder  so  that  the  two's  complement  of  -SIZE[j]  for  a 
12-bit  adder  will  be  performed. 

The  input  data,  Q  (10  bits),  comes  from  the  PROM  (IC26,27 
of  Figure  2),  and  the  output  data,  U,  goes  into  the  address 
register  of  another  PROM  (IC16,17  of  Figure  2).  The  required 
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Figure  15. 


Circuit  of  SIZE[i]  -  SIZE[j] 
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Figure  16 


Circuit  of  CHIPl 
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CHIPl  pin  assignment 


control  signals  of  CHIPl  and  CHIP3  are  produced  from  the 
costfunction  state  machine,  which  will  be  discussed  later. 

The  implementation  of  this  circuit  into  one  EP1800  uses 
only  20  percent  of  the  total  gates.  The  remaining  pins  and 
macrocells  may  be  used  to  implement  other  portions  of  the 
original  design. 

H.  INPUT  BUFFER  CIRCUITS 

The  input  buffers  accept  the  values  of  i  and  j.  The  buffer 
circuits  of  i  and  j  are  shown  in  Figure  18  and  Figure  19 
respectively.  Two  types  of  TTL  IC*s  used  in  this  circuit  are 
74ALS574  (octal  D-FF)  and  74ALS541  (octal  line  driver) . 

The  implementation  of  these  circuits  is  created  by  using 
the  ALTERA  primitives.  These  primitives  performs  the  same 
function  as  the  TTL  IC's  above.  The  circuit  is  shown  in  Figure 
20.  The  ALTERA  Design  System  converts  this  circuit  into  a 
JEDEC  design  file  which  is  programmed  into  one  EP1800  chip. 
The  programmed  EP1800  chip  for  i  and  j  is  called  CHIP4i  and 
CHIP4j  respectively.  The  pin  assignment  of  this  EPLD  is  shown 
in  Figure  21. 

The  register  of  each  buffer  store  16-bit  values  of  i  or  j  . 
Thirteen  least  significant  bits  are  passed  from  this  register 
to  the  output.  These  bits  are  combine  with  RAM/MAC  signal  to 
create  a  14-hit  output  (MOO  to  M13) .  Since  values  of  i  and  j 
will  be  output  onto  the  same  bus,  the  control  signals  ENI  and 
ENJ  will  ensure  that  both  values  will  not  be  placed  on  the  bus 
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Figure  18.  Input  buffer  i 
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Figure  19.  Input  buffer  j 
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at  the  same  time.  Bit  M13  (in  Figure  2)  of  the  output  bus 
distinguish  the  accessing  between  the  ACC  RAM  and  the  TABLE 


I.  CONTROL  AND  PC-INTERFACE 

The  costfunction  circuit  performs  the  calculation  by 
issuing  a  sequence  of  control  signals.  These  control  signals 
are  generated  according  to  the  state  in  Figure  2  by  IC7 .  There 
are  a  total  of  11  states  to  complete  the  calculation  for  each 
i  and  j .  The  state  diagram  is  shown  in  Figure  7  and  its 
corresponding  RTL  is  shown  in  Figure  6.  A  state  triggered  by 
the  external  clock  signal  will  transit  to  the  next  state.  This 
clock  signal  also  determines  the  speed  of  the  costfunction 
calculation.  Initially,  the  costfunction  circuit  communicates 
with  the  PC-BUS  to  load  the  data  into  the  ACC  RAM  and  the 
TABLE  RAM.  These  can  be  done  via  the  address  decoder  (IC40) 
and  the  PC-INTERFACE  circuit  (IC33,39)  of  Figure  2. 

By  using  the  utilities  in  the  ALTERA  system,  these  2 
functions  can  be  combined  and  implemented  into  one  EP1800. 
Boolean  equations  of  these  two  functions  are  entered  in  a 
format  of  the  ALTERA  Design  File  (ADF) .  This  file  is  shown  in 
Appendix  A.  The  ALTERA  Design  System  converts  this  file  into 
a  JEDEC  file  which  is  then  used  to  program  the  EP1800 
hardware. 

In  the  VHDL  modeling,  a  JEDEC  file  is  read  into  the  EP1800 
model  to  tailor  the  programmed  function.  From  the  results  of 
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the  simulation,  it  was  shown  that  the  function  of  the  state 
machine  alone  can  be  operated  with  a  clock  speed  of  20  MHZ.  In 
actual  opeiation,  th^  state  machine  in  the  circuit  has  to  wait 
for  the  delay  of  data  path  elements  such  as  the  memory  access 
for  the  ACC[i],  and  the  TABLE [i],  etc.  Therefore,  the  clock 
speed  in  the  costfunction  simulation  is  slower  than  is 
possible  with  the  controller. 

The  implementation  uses  40  percent  of  the  macrocells  of 
one  EP1800.  The  remaining  macrocells  are  reserved  for  the 
function  of  the  asynchronous  interface  module  (IC08)  whereas 
the  information  of  the  module  is  not  currently  available.  The 
rest  of  the  chip  may  include  COUHTER16  (IC09)  and  PC-DATA  bus 
(IC33,39)  if  there  are  enough  pins  and  macrocells  available. 

The  ALTERA  system  also  provides  the  state  table  entry 
instead  of  the  Boolean  equation  entry.  The  states  of 
costfunction  can  be  accepted  in  a  format  of  the  State  Machine 
File  (SMF)  w'lich  is  created  in  a  text  editor.  The  SMF  is 
converted  to  ADF  by  the  ALTERA  state  machine  converter.  The 
ADF  of  the  state  machine  can  be  merged  to  the  ADF  of  the  Po¬ 
inter  face  to  produce  an  ADF  of  the  combined  functions.  From 
our  experience,  more  ADF  functions  can  be  merged  as  long  as 
there  are  enough  I/O  pins  and  macrocells  available  in  an 
EP1800.  The  ALTERA  Design  System  determines  the  pins  and  the 
macrocells  availability.  The  final  result  is  a  JEDEC  file 
which  will  be  used  to  program  the  EPLD  chip. 
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The  pin  assignment  of  the  programmed  EP1800  is  shown  in 
Figure  22.  Signals  for  state  machine  are  mapped  as  follows: 


•  SO  connects  to  the  CLR_REQ  signal  which  is  an  active  low 
signal . 

•  SI  connects  to  the  ENABLE  signal  which  is  determined  by 
the  asynchronous  chip  to  enable  the  ENi  signal  or  the  ENj 
signal . 

•  S2  connects  to  the  LATCH  signal  which  is  an  active  high 
signal . 

•  S3  connects  to  the  INV  signal  which  is  also  an  active  high 
signal . 

•  S4  connects  to  the  RAM/MAC  signal.  The  ACC  RAM  is  accessed 
when  the  RAM/MAC  signal  is  asserted  low  otherwise  the 
TABLE  RAM  is  accessed. 

•  S5  connects  to  the  CLKY  signal  which  is  asserted  high. 

•  S6  connects  to  the  CLKX  signal  which  is  also  asserted 
high . 

•  S7  connects  to  the  CF_DONE  which  is  asserted  high. 

In  this  chapter  a  number  of  circuits  of  the  costfunction 
are  integrated  into  high  density  EP1800  chips.  The  procedures 
and  rationing  of  partition  in  the  design  are  discussed.  In  the 
next  chapter  some  behavior  models  used  in  the  final  simulation 
will  be  discussed. 
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III. 


SOME  VHDL  BEHAVIOR  MODELS 


A.  VHDL  BEHAVIOR  MODELING 

The  modeling  in  VHDL  can  be  performed  either  in  behavioral 
level  or  in  structural  level.  A  behavioral  model  can  be 
defined  as  the  functional  interpretation  of  a  particular 
system.  A  structural  model  contains  conceptual  partitions 
which  decompose  the  model  into  functionally  related  sections. 
A  structural  description  of  a  piece  of  hardware  is  a 
description  of  what  its  subcomponents  are  and  how  the 
subcomponents  are  connected  to  each  other.  It  is  an  important 
characteristic  of  the  VHDL  that  a  designer  can  mix  behavioral 
and  structural  descriptions  at  any  level  [Ref.  9]. 
This  ability  to  mix  description  modes  offers  the  designer 
several  advantages.  First,  the  refinement  from  behavior  to 
structure  need  not  proceed  at  the  same  rate  for  all  portions 
of  the  design.  Therefore,  at  some  stage  a  design  may  contain 
both  an  abstract  behavioral  description  for  unrefined  portions 
and  a  structural  breakdown  for  portions  whose  refinement  is 
known.  Second,  it  is  not  necessary  for  a  designer  to  simulate 
everything  at  a  low  level  design.  The  portions  of  the  design 
that  have  already  been  verified  at  a  low  structural  level  can 
be  replaced  with  behavioral  versions  for  incorporation  into 
larger  simulations  [Ref.  10]. 
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In  this  section,  two  VHDL  source  codes  are  written  as  the 
behavior  models  of  2  components  which  are  the  PROM  (IC16,17) 
and  the  AM29510  (IC25)  in  Figure  2.  Since  the  functions  of 
these  two  components  are  known,  it  is  easy  to  write  a  VHDL 
code  that  accepts  inputs  and  produces  outputs  with  the 
characteristic  delays  for  each  component.  At  this  point  in 
time,  it  is  not  necessary  for  the  VHDL  code  to  implement  the 
exact  hardware  architecture  of  the  components.  Initially,  only 
simulation  of  the  functionality  of  the  components  are 
concerned. 

To  implement  the  VHDL  behavior  model,  a  standard  logic 
package  from  the  VANTAGE  SYSTEM  is  used.  The  VANTAGE  SYSTEM  is 
a  VHDL  support  environment.  This  package  includes  some 
functions  that  can  be  utilized  to  convert  the  binary  value 
into  the  integer  value. 

B.  1/SIZE  PROM  MODELING 

The  function  of  this  memory  circuit,  as  shown  in  Figure 
23,  is  to  accept  a  value  which  is  a  value  of  SIZEfi]  ~  SIZE[j] 
from  the  CHIPl.  This  value  represents  a  fixed  point  number  of 
a  fraction.  For  example,  an  input  value  of  25  is  interpreted 
as  0.25  and  the  output  will  be  1/0.25  =  4.  The  PROM  also 
hanriies  a  negative  number  represented  in  two's  complement 
form. 

To  implement  the  function  of  this  PROM  in  a  VHDL  behavior 
model,  first  the  logic  input  data  is  converted  to  an  integer 
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Figure  23.  1/SIZE  PROM  circuit 


value  by  the  subroutine  f_logic_to_int2c  [Ref.  11]. 
This  integer  value  is  then  transformed  into  a  fixed  point 
number.  By  taking  the  inverse  of  this  fixed  point  number,  the 
result  is  obtained.  This  result  is  still  in  the  from  of  an 
integer  number  and  is  converted  to  a  logic  value  of  16  bits 
output  by  the  subroutine  f_int_to_logic2c .  Since  this  PROM 
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handles  the  inversion  of  both  the  positive  and  the  negative 
values,  the  VHDL  code  have  to  handle  two's  complement  numbers 
for  both  input  and  output.  The  access  time  of  this  PROM  is 
also  modeled  as  100  ns.  The  coding  of  this  model  is  shown  in 
.\ppendix  B. 

C.  AM29510  MODELING 

AM29510  is  a  high-speed  16x16  bit  multiplier/accumulator 
(MAC)  chip.  Figure  24  shown  the  block  diagram  of  this  IC.  The 
X  and  Y  input  registers  can  accept  16-bit  inputs  in  either  the 
two's  complement  or  the  unsigned  magnitude  formats.  An 
additional  register  stores  the  Two's  Complement  (TC) ,  Round 
(RND) ,  Accumulator  (ACC) ,  and  Substraction/Addition  (SUB/ADD) 
control  bits.  This  register  is  clocked  whenever  the  X  and  Y 
input  registers  are  clocked.  The  35-bit  accumulator/ output 
register  contains  the  full  32-bit  multiplier  output  which  is 
sign  extended  or  zero-filled  based  on  the  TC  control  bit.  The 
accumulator  can  also  be  preloaded  from  an  external  source 
through  the  bi-directional  P  port.  The  operation  of  the 
accumulator  is  controlled  by  the  signals  ACC,  SUB/ADD,  and 
PREL.  Each  of  the  input  registers  and  the  utput  register  has 
independent  clocks. 

In  the  VHDL  behavior  model  of  the  AM29510,  some  details 
of  this  IC  are  disregarded.  For  example,  the  ROUNDING  function 
and  the  separation  of  output  bits  are  not  simulated.  The  main 
purpose  of  this  VHDL  coding  for  the  AM29510  is  only  to  support 
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Figure  24.  AM29510  block  diagreun 

(adapted  from  AM29510  DATA  SHEET) 
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the  initial  simulation  of  the  operations  of  the  costfunction. 
The  program,  accepts  two  16-bit  inputs  and  produces  one  16-bit 
output.  In  the  model  studied  here  only  the  necessary  control 
signals  are  implemented.  This  excludes  RND,  and  PREL  signals. 
The  LEM/OEM,  LEX/OEX,  and  LEL/OEL  signals  are  combined  into 
one  LE/OE  signal.  The  model  is  divided  into  two  portions.  The 
first  portion  handles  the  timing  characteristic  of  the 
AM29510.  The  second  portion  handles  the  functions  of  the 
multiplication  and  the  accumulation.  Appendix  C  includes  the 
VHDL  coding  of  the  AM29510. 

D.  SUMMARY 

The  behavioral  modeling  done  in  this  research  offers  the 
capability  to  simulate  the  initial  function  requirements  of 
the  costfunction  circuit.  An  important  fact  for  each  model  is 
that  the  result  is  reasonable  within  the  specified  delay.  The 
behavioral  model  that  is  effi  ent  can  help  a  fast  simulation 
in  high-level  design.  At  the  time  when  the  structure  models 
are  created,  the  behavioral  models  can  be  replaced  by  the 
structural  models  directly.  The  simulation  of  a  behavioral 
description  is  usually  less  time  consuming  than  a  structural 
description.  But,  a  structural  model  may  provide  a  more 
accurate  description  of  the  actual  hardware  operations  in  a 
chip.  Therefore,  both  the  behavior  model  and  the  structure 
model  have  advantages  and  disadvantages  in  applications. 
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IV.  VHDL  SIMULATION  AND  ANALYSIS 

A.  THE  SIMULATION  OF  COMBINED  CIRCUIT 

In  this  chapter,  implementation  of  the  circuits  in  the 
previous  chapter  are  combined  and  simulated  in  VHDL.  The 
observation  is  concentrated  on  the  operations  of  CHIPS,  CHIPl, 
1/SIZE  PROM,  and  AM29510  by  invoking  control  signals  from  the 
CONTROL  chip.  A  VHDL  testbench  program  is  created  for  the 
testing  [Ref.  12],  which  is  in  Appendix  D.  This 
program  includes  all  the  partition  circuit  components.  There 
is  a  process  to  handle  the  values  supplied  from  RAM  to  CHIPS 
and  CHIPl.  The  program  uses  the  EP1800  structural  model,  as 
developed  by  a  previous  NPS  thesis  student.  At  the  simulation 
time,  each  EP1800  is  personalized  by  reading  a  particular 
JEDEC  file  for  its  designed  functions.  There  are  S  JEDEC  files 
to  be  read  for  CHIPS,  CHIPl,  and  CONTROL  modules  respectively. 
Names  are  also  assigned  to  each  EP1800  pin  corresponding  to 
the  pin  mapping  report  file  from  the  ALTERA  system. 

When  the  simulation  starts,  the  necessary  signals  and 
variables  are  assigned  to  their  initial  values.  Then,  the 
PC_CLR  signal  is  asserted  in  order  to  reset  the  state  machine 
to  state  0.  The  present  state  is  changed  to  the  next  state 
after  the  rising  edge  of  the  CLK  signal.  The  final  state  is 
state  9.  In  this  state,  the  result  is  obtained  at  the  output 
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of  the  AM29510  together  with  the  CF_DONE  signal.  The  result  is 
available  correctly  up  to  90  nanoseconds  after  the  rising  edge 
of  the  CF_DONE  signal.  The  simulation  result  is  shown  in 
Appendix  E.  The  testing  values  for  the  simulation  is  as 


follows  : 

Binary  Decimal 

TABLE[i]  =  0000000000001111;  15 

TABLE[j]  =  0000000000000001;  1 

SIZE[i]  =  000000001111  ;  15 

SIZE[j]  =  000000000001  ;  1 

ACC[i]  =  0000000000000010;  2 


The  result  of  COSTFUNCTION [ i , j ]  =  0000000001100000,  which  is 


translated  to  96.  The  actual  result  should  be  98. 

Since  the  content  of  the  1/SIZE  PROM  is  not  available  to 
us  at  the  present  time,  a  simple  mathematical  inversion  was 
assumed.  The  VHDL  PROM  implementation  uses  integer 
calculations  internally.  This  should  cause  a  slight  difference 
between  the  VHDL  simulated  result  and  the  anticipated  PROM 
result.  However,  the  simulation  produces  a  close  enough  result 
to  the  hardware  implementation. 


B.  TIMING  DELAY  ANALYSIS 

The  timing  characteristics  of  each  EP1800  depends  on  how 
the  EPLD  is  programmed.  Although,  the  EP1800  does  not  have 
individual  gate  delay  timing,  there  still  are  delays 
associated  with  the  macrocell,  the  feedback  path,  as  well  as 
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the  I/O  control  section.  In  order  to  operate  the  circuit 
correctly,  the  state  machine  (CONTROL)  must  produce  each  state 
control  signal  at  the  appropriate  time  after  the  longest  delay 
of  the  data  path  in  EPLDs  elapses.  In  the  simulation,  the 
measurement  of  CHIP3  and  CHIPl  is  made.  The  longest  delay  time 
from  input  to  output  is  1200  nanoseconds. 

Most  of  the  delay  occurs  in  CHIP3  and  CHIPl.  These  two 
EPLDs  implement  the  circuit  of  the  16  bits  and  the  12  bits 
full  adder  with  registers.  The  worst  case  delay  is  caused  by 
the  ripple  carry  propagation  from  the  least  significant  bit  to 
the  most  significant  bit.  The  EP1800  has  3  types  of  macrocell 
which  has  difference  delay  characteristics.  Since  different 
bit  in  the  full  adder  may  be  programmed  into  difference  type 
of  macrocell,  this  result  in  the  different  delay  in  the  output 
bits.  The  CONTROL  has  to  ensure  that  the  output  signal  is 
stable  and  then  latch  the  result  and  change  the  state. 

C.  CIRCUIT  SPEED  ANALYSIS 

In  this  simulation,  the  clock  speed  for  the  state  machine 
is  varied  experimentally.  A  suitable  clock  period  is  300 
nanoseconds.  Compared  to  the  original  design  with  a  clock  rate 
of  25  nanoseconds,  this  implementation  is  considerably  slower. 
This  is  a  trade  off  in  the  design  for  high  integration. 
However,  the  board  space,  as  well  as  power  consumption,  was 
reduced.  The  design  can  also  be  changed  by  re-programming  the 
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EPLD.  The  EPLD  can  also  be  replaced  by  a  faster  version  of  the 
EPLD  in  the  future. 

D.  IMPROVEMENT 

There  are  several  ways  that  can  improve  the  efficiency  of 
the  integrated  implementation.  One  is  to  redesign  the  adder 
circuit.  A  carry-look-ahead  adder  circuit  may  be  desirable  and 
can  be  easily  designed.  Another  way  is  to  use  the  idle 
macrocells  and  logic  for  more  integration.  The  added 
integration  may  include  some  functions  from  the  upper  level 
circuitry.  The  efficiency  can  also  be  improved  if  the  EPLD  pin 
assignment  can  be  manually  arranged  so  that  the  delay  between 
macrocells  is  minimum.  However,  the  side  effect  is 
inconvenience  and  time  consuming.  The  simulation  time  of  the 
testbench  takes  almost  4  hours  to  complete.  For  the  full 
implementation  of  the  costfunction  PCB,  the  simulation  time 
may  take  6  to  8  hours.  It  is  expected  that  in  a  faster 
computer  system  the  simulation  time  can  be  greatly  reduced. 
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V 


CONCLUSIONS 


A.  CONCLUSIONS  AND  FUTURE  RESEARCH 

In  this  research,  a  software  cost function  routine  is 
converted  into  a  Register  Transfer  Language  (RTL)  description. 
This  RTL  description  can  be  used  to  design  the  hardware.  The 
original  costfunction  circuit  in  TTL  logic  is  also  redesigned 
to  achieve  a  high  integration  using  EPlSOOs.  The  integration 
is  based  on  the  partition  of  the  costfunction  circuit  into 
several  modules,  each  of  which  corresponds  to  a  statement  in 
the  costfunction  software.  A  VHDL  structural  model  of  EP1800 
is  used  to  simulate  the  function  modules  of  CHIPS,  CHIPl, 
CONTROL,  CHIP4i,  and  CHIP4 j .  An  ALTERA  JEDEC  file  is  read  into 
the  model  to  personalize  the  functions.  The  PROMs  and  the 
AM29510  are  also  described  by  their  behavior  models  in  VHDL  in 
the  final  simulation  experiment. 

The  total  simulation  for  the  major  operation  of  the 
costfunction  circuit  is  done  at  the  end.  The  result  is 
slightly  different  from  the  anticipated  value  due  to  the 
integer  calculation  in  the  1/SIZE  PROM  VHDL  model.  This  can  be 
improved  if  more  information  for  this  PROM  can  be  obtained. 
The  speed  of  the  redesigned  costfunction  circuit  is  its  main 
drawback  due  to  the  long  delay  of  the  EPLDs.  The  total 
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simulation  turn-around  time  on  a  Micro  VAX  is  also 
discouraging. 

For  future  research,  it  is  desirable  to  simulate  the 
remaining  part  of  the  original  design,  including  the  pipeline 
and  sequence  generator.  After  that,  the  graph  partition 
hardware  is  the  final  target  for  the  total  simulation.  Since 
there  are  many  unused  macrocells  left  in  the  revised 
cost function  circuit,  these  can  be  used  to  integrate  higher 
level  functions.  It  is  also  hoped  that  the  future  technology 
of  EPLD  can  provide  a  reduced  delay  time  for  the  device.  At 
that  time,  the  simulation  of  this  revised  design  will  produce 
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CONTROL  AND  PC  INTERFACE 


MING  IMVIDHAYA 
NAVAL  POST  GRADUATE  SCHOOL 
4/24/90 
1.00 
B 

EP1800 

P22V10  COST_FUNCTION  &  PC_INTERFACE 

OPTIONS : TURBO=ON , SECURITY=OFF 

PART:EP1800 

INPUTS:  CLK,/PCCLR, 

ADR9 , ADR8 , ADR7 , ADR6 , ADR5 , ADR4 , ADR3 , ADR2 , ADRl , ADRO , 
/IOR,/IOW, AEN,/QIN 

OUTPUTS:S8pe23,S7p06O,S6p@59,S5p§58,S4p@57,S3p@13,S2p@12, 

Slp@ll,SOp01O, 

/X3O4044 ,/X306@45,/X308@46,/X30A@47,/X300@24 , 
/QOUT@25 ,QOUTE@26 

NETWORK: 

CLK  =  INP(CLK) 

CLOCK  =  CLKB(CLK) 

/PCCLR  =  INP(/PCCLR) 

PCCLR  =  NOT(/PCCLR) 

S8p,SS  =  RORF(S8c, CLOCK, GND,GND,VCC) 

S7p,S7  =  RORF(S7c, CLOCK, GND,GND,VCC) 

S6p,S6  =  RORF(S6c, CLOCK, GND,GND,VCC) 

S5p,S5  =  RORF(S5c, CLOCK, GND,GND,VCC) 

S4p,S4  =  RORF(S4c, CLOCK, GND,GND,VCC) 

S3p,S3  =  RORF(S3c, CLOCK, GND,GND,VCC) 

S2p,S2  =  RORF(S2c, CLOCK, GND,GND,VCC) 

Sip, SI  =  RORF(Slc, CLOCK, GND,GND,VCC) 

SOp,SO  =  RORF(SOc, CLOCK, GND,GND,VCC) 


ADR9 

INP(ADR9; 

ADR8 

= 

INP(ADR8) 

ADR7 

= 

INP(ADR7) 

nADR7 

=  NOT(ADR7) 

ADR6 

= 

INP(ADR6) 

nADR6 

=  NOT(ADR6) 

ADR5 

INP(ADR5) 

nADR5 

=  NOT(ADR5) 

ADR4 

= 

INP(ADR4 ) 

nADR4 

=  NOT(ADR4) 

ADR3 

= 

INP(ADR3) 

nADR3 

=  NOT(ADR3) 

ADR2 

INP(ADR2) 

nADR2 

=  NOT(ADR2) 

ADRl 

= 

INP(ADRl) 

ADRO 

= 

IMP (ADRO) 

nADRO 

=  NOT (ADRO) 

nIOR 

= 

INP(/IOR) 

lOR  = 

NOT(nlOR) 

nlOW 

= 

INP(/IOW) 

low  = 

NOT(nlOW) 
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AEN  = 
nQIN  = 
QOUTEC 

QOUTE 

/X304 

/X306 

/X308 

/X30A 

/X300 

/QOUT 


INP(AEN)  nAEN  =  NOT (AEN) 

INP(/QIN)  QIN  =  NOT (nQIN) 

=  AND ( lOR , nAEN , ADR9 , ADR8 , nADR7 , nADR6 , 

nADRS , nADR4 , nADR3 , nADR2 , ADRi , nADRO ) 
CONF(QOUTEc,VCC) 

nX304c 
nX306c  = 
nX308c  = 
nX30Ac  = 
nX300c  = 
nQOUTc 


CONF(nX304c,ADR9) 
CONF(nX306c,ADR9) 
CONF ( nX3  08c , ADR9 ) 
CONF ( nX3  OAc , ADR9 ) 
CONF ( nX3  00c , ADR9 ) 
CONF ( nQOUTc , QOUTEc ) 


NOT(X304) 

NOT(X306) 

NOT(X308) 

NOT(X30A) 

NOT(X300) 

=  NOT (QOUT) 


EQUATIONS ; 


S8c  =  /PCCLR*/S8*/S7*/S6*/S5*/S4*/S3*/S2*S1*S0  + 
/PCCLR*/SG*/37*/S6*S5*34»/L-3*/S2*S1*/S0  + 
/PCCLR*/S8*/S7*/S6VS5*S4*/S3*S2*/S1*S0; 

S7c  ==  /PCCLR*/S8*/S7*/S6*/S5*/S4*/S3*/S2*/S1*/S0; 

S6c  =  /PCCLR*S8*/S7*/S6*/S5*/S4*/S3*/S2*S1*S0  + 
/PCCLR*/S8*/S7*/S6*/S5*S4*S3*S2*/S1*S0 ; 

S5c  =  /PCCLR*/S8*/S7*S6*/S5*/S4*/S3*S2*S1*S0  + 
/PCCLR*/S8*/S7*/S6*/S5*S4*S3*S2*/S1*S0  + 
/PCCLR*/S8*/S7*/S6*/S5*/S4*/S3*/S2*/S1*/S0; 

S4c  =  /PCCLR*/S8*/S7*S6'^/S5*/S4*/S3*S2*S1*S0  + 
/PCCLR*/S8*/S7*/S6*S5*S4*/S3*/S2*S1*/S0  + 
/PCCLR*S8*/S7*/S6*/S5*S4*/S3*/S2*S1*S0  + 
/PCCLR*/S8*/S7*/S6*/S5*S4*S2*/S1*S0  + 
/PCCLR*S8*/S7*/S6*/S5*S4*S3*/S2*/S1*S0; 

S3c  ==  /PCCLR*/S8*/S7*/S6*/S5*S4*/S3*S2*/S1*S0  + 
/PCCLR*S8*/S7*/S6*/S5*S4*S3*/S2*/S1*S0 ; 

S2c  =  /PCCLR*S8*/S7*/S6*/S5*/S3*/S2*S1*S0  + 

/PCCLR*S8*/S7*/S6*/S5*S4*S3*/S2*/S1*S0 ; 

Sic  =  PCCLR  + 

/S7*/S6*/S5*/S4*/S3*/S2*S1*S0  + 
/S8*/S7*S6*/S5*/S4*/S3*S2*S1*S0  + 
/S8*/S7*/S6*S5*S4*/S3*/S2*S1*/S0  + 
/SS*S7*/S6*S5*/S4*/S3*/S2*/S1*/S0; 

SOc  -  rCCLR  + 

/S7*/S6*/S5*/S4*/S3*/S2*S1*S0  -t- 
/S8*/S7*/S6*S5*S4*/S3*/S2*S1*/S0  + 
S8*/S7*/S6*/S5*/S3*/S2*S1*S0  + 
/S8*/S7*/S6*/S5*S4*/S3*S2*/S1*S0  + 
S8*/S7*/S6*/35*S4*S3*/S2*/S1*S0  + 
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/S8*S7*/S6*S5*/S4*/S3*/S2*/S1*/S0; 

X300  =  IOW*/AEN*ADR9*ADR8*/ADR7*/ADR6*/ADR5*/ADR4* 
/ADR3  */ADR2  */ADRl */ADRO ; 

X304  =  I0W*/AEN*ADR9*ADR8*/ADR7*/ADR6*/ADR5*/ADR4* 
/ADR3  *ADR2  */ADRl*/ADRO ; 

X3C6  =  I0W*/AEN*ADR9*ADR8*/ADR7*/ADR6*/ADR5*/ADR4* 
/ ADR3  * ADR2  *  ADRl-^/ADRO  ; 

X308  =  IOVJ*/AEN*ADR9*ADR8*/ADR7*/ADR6*/ADR5*/ADR4* 
ADR3  */ADR2  */ADRl */ADRO ; 

X30A  =  I0R*/AEN*ADR9*ADR8*/ADR7*/ADR6*/ADR5*/ADR4* 
ADR3  */ADR2  *ADRl*/ADRO ; 

QOUT  =  QIN; 

END$ 
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APPENDIX  B  -  1/SIZE  PROM  BEHAVIOR  MODEL 


—  SIZEPROM.VHD 
library  SHU; 

USE  SHU.EPROM_PACK.ALL; 

USE  std . std_logic . ALL; 

USE  std.std_ttl.ALL; 

USE  work. intpack_ttl .ALL; 

USE  work. mingpack. ALL; 

ENTITY  sizeproiti  IS 

PORT  (xO , xl , x2 , X3 , x4 , x5, x6, x7 , 

x8 , x9 , xlO , xll , xl2 , xl3 , xl4 , xl5  :  IN  tri_state; 
yO , y 1 , y2 , y3 , y4 , y5 , y6 , y7 , 

y8 , y9 , ylO, yll ,yl2 , yl3 , yl4 , yl5  :  OUT  tri_state) ; 

END  sizeprom; 

ARCHITECTURE  full  OF  sizeprom  IS 
CONSTANT  tac  :TIME  :=  100  ns ; 

SIGNAL  xxO , xxl , xx2 , xx3 , xx4 , xx5 , xx6 , xx7 , 

xx8 , xx9 , xxlO , xxll , xxl2 , xxl3 , xxl4 , xxl5 , 
yy 0 , yy 1 , yy2 , yy3 , yy4 , yys , yye , yy7 , 

yy8,yy9,yyl0,yyll,yyl2,yyl3,yyl4,yyl5  :  t_wlogic; 

BEGIN 

—  input  delay  processing 

xxO  <=  f_ttl (tri_to_tstate (xO) )  AFTER  tac; 
xxl  <=  f_ttl (tri_to_tstate (xl) )  AFTER  tac; 
xx2  <=  f_ ttl ( tri_to_tstate (x2 ) )  AFTER  tac; 
xx3  <=  f_ttl (tri_to_tstate (x3 ) )  AFTER  tac; 
xx4  <=  f_ttl (tri_to_tstate (x4 ) )  AFTER  tac; 
xx5  <=  f_ttl (tri_to_tstate(x5) )  AFTER  tac; 
xx6  <=  f_ttl (tri_to_tstate (x6) )  AFTER  tac; 
xx7  <=  f_ttl (tri_to_tstate (x7) )  AFTER  tac; 
xx8  <=  f_ttl (tri_to_tstate(x8) )  AFTER  tac ; 
xx9  <=  f_ttl (tri_to_tstate (x9) )  AFTFR  tac; 
xxlO  <=  f_ttl ( tri_to_tstate (xlO) )  AFTER  tac; 
xxll  <=  f_ttl (tri_to_tstate (xll) )  AFTER  tac; 
xxl2  <=  f_ttl ( tri_to_tstate (xl2 ) )  AFTER  tac; 
xxl3  <=  f_ttl (tri_to_tstate (xl3 ) )  AFTER  tac; 
xxl4  <=  f_ttl ( tri_to_tstate (xl4 ) )  AFTER  tac; 
xxl5  <=  f_ttl (tri_to_tstate (xl5) )  AFTER  tac; 


—  operation 

PROCESS (xxO , xxl , xx2 , xx3 , xx4 , xx5 , xx6 , xx7 , 

xx8 , xx9 , xxlO , xxll , xxl2 , xxl3 , xxl4 , xxl5) 
VARIABLE  state  :  t_logarray(l  TO  16) ; 
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VARIABLE  X  :  integer  :=  0; 

VARIABLE  xr:  real  :=  0.0; 

VARIABLE  y  :  integer  :=  0; 

VARIABLE  p  :  integer  :=  0; 

VARIABLE  unknown  :  boolean  :=  true; 

BEGIN 

—  pickup  X  data 

state(16)  :=  xxO; 
state (15)  :=  xxl; 
state (14)  :=  xx2 ; 

state (13)  :=  xx3 ; 

state (12)  :=  xx4 ; 
state (11)  :=  xx5 ; 

state (10)  :=  xx6; 

state (9)  :=  xx7 ; 

state (8)  :=  xx8 ; 

state (7)  :=  xx9 ; 

state(6)  :=  xxlO; 
state(5)  :=  xxll; 
state (4)  ;=  xxl2 ; 

state (3)  :=  xxl3; 

state (2)  :=  xxl4 ; 

state (1)  :=  xxl5; 

f_logic_to_int2c (state, unknown, x) ; 

xr  :=  REAL(x) ; 

WHILE  xr  >  1.0  LOOP 
xr  ;=  xr*0.1; 

END  LOOP; 

if  xr  =  0.0  then 

unknown  :=  true; 

else 

X  :=  INTEGER(1.0/xr) ; 
end  if; 

IF  NOT  unknown  THEN 

f_intl_to_logic2c (x, state, ttl ) ; 

END  IF; 

—  assign  values  to  output  signals 
IF  NOT  unknown  THEN 

yyO  <=  state(16); 
yyl  <=  state(15); 
yy2  <=  state ( 14 ); 
yy3  <=  state ( 13 ) ; 
yy4  <=  state ( 12 ) ; 
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ELSE 


END  IF; 


yy5  <=  state(ll); 
yy6  <=  state(lO); 
yy7  <=  state (9)  ; 
yyS  <=  state (8); 
yy9  <=  state (7); 
yylO  <=  state(6); 
yyll  <=  state (5)  ; 
yyl2  <=  state (4)  ; 
yyl3  <=  state (3) ; 
yyl4  <=  state (2)  ; 
yyl5  <=  state (1)  ; 


yyO 

<= 

FX; 

yyl 

<= 

FX; 

yy2 

<= 

FX; 

yy3 

<= 

FX; 

yy4 

FX; 

yy5 

<= 

FX; 

yy6 

<= 

FX; 

yy7 

<= 

FX; 

yy8 

<= 

FX; 

yy9 

<= 

FX; 

yylO  <=  FX; 
yyll  <=  FX; 
yyl2  <=  FX; 
yyl3  <=  FX; 
yyl4  <=  FX; 
yyl5  <=  FX; 


END  PROCESS; 

yO  <=  tstate_to_tri ( f_state (yyO) ) ; 
yl  <=  tstate_to_tri ( f_state (yyl) ) ; 
y2  <=  tstate_to_tri ( f_state (yy2 ) ) ; 
y3  <=  tstate_to_tri ( f_state (yy3 ) ) ; 
y4  <=  tstate_to_tri  (f__state  (yy4) )  ; 
y5  <=  tstate_to_tri (f_state (yyS) ) ; 
y6  <=  tstate_to_tri ( f_state (yy6) ) ; 
y7  <=  tstate_to_tri (f_state(yy7) )  ; 
y8  <=  tstate_to_tri (f_state (yy8) ) ; 
y9  <=  tstate_to_tri(f_state(yy9) ) ; 
ylO  <=  tstate_to_tri ( f_state (yylO) ) ; 
yll  <=  tstate_to_tri ( f_state (yyll) ) ; 
yl2  <=  tstate_to_tri ( f_state (yyl2 ) ) ; 
yl3  <=  tstate_to_tri ( f_state (yyl3) ) ; 
yl4  <=  tstate_to_tri ( f_state (yyl4 ) ) ; 
yl5  <=  tstate_to__tri  ( f_state  (yyl5)  )  ; 

END  full; 
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APPENDIX  C  -  AM29510  BEHAVIOR  MODEL 


A.  TIMING  BEHAVIOR 

—  AM29510.VHD 

library  SHU; 

USE  SHU.EPROM_PACK.ALL; 

USE  std . std_logic . ALL; 

USE  std.std_ttl.ALL; 

USE  work . intpack_ttl . ALL; 

USE  work. MINGPACK. ALL; 

ENTITY  AM29510  IS 

PORT  (XO , xl , x2 , x3 , x4 , x5 , x6 , x7 , 

x8,x9,xI0,xII,xI2, xl3 , Xl4 , xI5 , 

y0,yl,y2,y3,y4,y5,y6,y7, 

y8,y9,yl0,yII,yl2,yl3,yl4,yI5, 

tc , acc , sub_add , le_oe , 

clkx, clky , clkp  :  IN  tri_state; 

pI6,pl7,pI8,pl9,p20,p21,p22,p23, 

p24 ,p25,p26,p27,p28,p29,p30,p31  :  OUT  tr i_state )  ; 

END  AM29510; 

ARCHITECTURE  full  OF  AM29510  IS 


CONSTANT  tma 

TIME 

=50  ns; 

CONSTANT  ts 

TIME 

=0  ns;  — 25 

CONSTANT  th 

TIME 

=  0  ns; — 5 

CONSTANT  tsprel 

TIME 

=25  ns; 

CONSTANT  thprel 

TIME 

=  0  ns; 

CONSTANT  tpwh 

TIME 

=20  ns; 

CONSTANT  twpl 

TIME 

=20  ns; 

CONSTANT  tpdp 

TIME 

=40  ns; 

CONSTANT  tpdy 

TIME 

=40  ns; 

CONSTANT  tphz 

TIME 

=35  ns; 

CONSTANT  tplz 

TIME 

=35  ns; 

CONSTANT  tpzh 

TIME 

=40  ns; 

CONSTANT  tpzl 

TIME 

=40  ns; 

CUNSTANI  thcl 

TIME 

=  0  ns ; 

COMPONENT  multiplier_16_bit 

PORT  (xO , xl , x2 , x3 , x4 , x5 , x6 , x7 , 
x8,x9,xI0,xII,xl2,xl3,xl4,xl5, 
y0,yl,y2 ,y3 ,y4 ,y5,y6,y7, 
y3 ,y9,yI0,yll,yI2,yl3,yl4,yl5, 
tc , acc , sub_add , le_oe, 
clkx , clky , clkp  :  IN  t_wlogic  :=  FO ; 
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pl6,pl7 ,pl8,pl9,p20,p21,p22,p23, 
p24 ,p25,p26,p27,p28,p29, 
p30,p31  :  OUT  t_wlogic  :=  FO) ; 

END  COMPONENT; 

FOR  ALL  :  multipl ier_16_bit 

USE  ENTITY  work .  inultiplier_16_bit  ( f  ul  1 )  ; 

signal  xxO , xxl , xx2 , xx3 , xx4 , xx5 , xx6 , xx7 , 

xx8  ,  xx9  ,  xxlO ,  xxll ,  XX12  ,  xxl 3  ,  xxl4  ,  xxl5 ,  (, 

yy 0 , yy 1 , yy 2 , yy 3 , yy4 , yy5 , yy6 , yy7 , 

yy8 , yy9 , yy 10 , yy 11 , yyl2 , yyl3 , yyl4 , yyl5 , 

ppl6,ppl7,ppl8,ppl9,pp20,pp21,pp22 ,pp23 , 

pp24,pp25,pp26,pp27,pp28,pp29,pp30,pp31, 

tcl , accl , sub_addl, le_oel, 

clkxl , clkyl , clkpl  :  t_wlogic  :=  FO; 

BEGIN 

MUL  :  mult ipl ier_16_bit 

PORT  MAP  ( xxO , xxl , xx2 , xx3 , xx4 , xx5 , xx6 , xx7 , 

xx8 , xx9 , xxlO , xxll , xxl2 , xxl3 , xxl4 , xxl5 , 
yy 0 , yy 1 , yy 2 , yy 3 , yy4 , yyS , yy6 , yy7 , 
yy8,yy9,yyl0,yyll,yyl2,yyl3,yyl4,yyl5, 
tcl , accl , sub_addl , le_oel , clkxl , clkyl , clkpl , 
ppl6 , ppl7 , ppl8 , ppl9 , pp20 , pp2 1 , pp22 , pp2  3 , 
pp24 ,pp25,pp26,pp27,pp28,pp29,pp30,pp31) ; 

--  input  delay  processing 

tcl  <=  f_ttl (tri_to_tstate (tc) ) ;  " 

accl  <=  f_ttl (tri_to_tstate (acc) ) ; 

sub_addl  <=  f_ttl ( tri_to_tstate {sub_add) ) ; 

le_oel  <=  f_ttl (tri_to_tstate (le_oe) ) ; 

clkxl  <=  f_ttl ( tri_to_tstate (clkx) ) ; 

clkyl  <=  f_ttl ( tri_to_tstate (clky ) ) ; 

clkpl  <=  f_ttl ( tri_to_tstate (clkp) ) ; 

xxO  <=  f_ttl (tri_to_tstate (xO) )  ; 
xxl  <=  fttl (tri_to_tstate (xl) ) ; 
xx2  <=  f_ttl ( tri_to_tstate (x2 )  )  ; 
xx3  <=  f_ttl ( tr i_to_tstate (x3 ) ) ; 
xx4  <=  f_ttl (tri_to_tstate (x4 ) ) ; 
xx5  <=  fttl (tri_to_tstate (x5) ) ; 
xx6  <=  f_ttl (tri_to_tstate (x6) ) ; 
xx7  <=  f_ttl (tri_to_tstate (x7) ) ; 
xx8  <=  f_ttl ( tri_to_tstate (x8) ) ; 
xx9  <=  f_ttl (tri_to_tstate(x9) ) ; 
xxlO  <=  f_ttl (tri_to_tstate (xlO)  )  ; 
xxll  <=  f_ttl ( tri_to_tstate (xll) ) ; 
xxl2  <=  f_ttl (tri_to_tstate(xl2)  )  ; 
xxl3  <=  fttl (tri_to_tstate (xl3) ) ; 
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xxl4  <=  f_ttl (tri_to_tstate(xl4) ) ; 
xxl5  <=  f_ttl (tri_to_tstate (xl5) ) ; 

yyO  <=  f_ttl (tri_to_tstate (yO) ) ; 
yyl  <=  fttl (tri_to_tstate (yl) ) ; 
yy2  <=  f_ttl  (tri_to_tstat.e  (y2) )  ; 
yy3  <=  f_ttl (tri_to_tstate{y3) ) ; 
yy4  <=  f_ttl (tri_to_tstate (y4) ) ; 
yy5  <=  f_ttl (tri_to_tstate (y5) ) ; 
yy6  <=  f_ttl (tri_to_tstate (y6) ) ; 
yy7  <=  f_ttl (tri_to_tstate(y7) ) ; 
yy8  <=  f_ttl (tri_to_tstate (y8) ) ; 
yy9  <=  f_ttl (tri_to_tstate (y9) ) ; 
yylO  <=  f_ttl (tri_to_tstate (ylO)  ) ; 
yyll  <=  f_ttl (tri_to_tstate (yll) ) ; 
yyl2  <=  f_ttl {tri_to_tstate(yl2) ) ; 
yyl3  <=  f_ttl (tri_to_tstate (yl3 ) ) ; 
yyl4  <=  f_ttl ( tri_to_tstate (yl4 ) ) ; 
yyl5  <=  f_ttl (tri_to_tstate(yl5) ) ; 

pl6  <=  tstate_to_tri ( f_state (ppl6) ) 
pl7  <=  tstate_to_tri ( f_state (ppl7) ) 
pl8  <=  tstate_to_tri (f_state(ppl8) ) 
pl9  <=  tstate_to_tri ( f_state (ppl9) ) 
p20  <=  tstate_to_tri ( f_state (pp20) ) 
p21  <=  tstate_to_tri (f_state(pp21) ) 
p22  <=  tstate_to_tri ( f_state (pp22 ) ) 
p23  <=  tstate_to_tri ( f_state {pp23 ) ) 
p24  <=  tstate_to_tri ( f_state (pp24 ) ) 
p2  '  <=  tstate_to_tri ( f_state (pp25) ) 
p26  <=  tstate_to_tri ( f_state (pp26) ) 
p27  <=  tstate_to_tri ( f_state (pp27 ) ) 
p28  <=  tstate_to_tri ( f_state (pp28 )  ) 
p29  <=  tstate_to_tri (f_state(pp29) ) 
p30  <=  tstate_to_tri ( f_state (pp30) ) 
p31  <-  tstate_tc_tri ( f_state (pp31)  ) 

END  full; 


AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 
AFTER  tma+tpdp; 


B.  MULTIPLICATION  BEHAVIOR 

—  MULTI PLY. VHD 

USE  std . std_logic . ALL; 

USE  std . std_ttl . ALL; 

USE  work . intpack_ttl . ALL; 

ENTITY  multiplier_I6_bit  IS 

PORT  (xO , xl , x2 , x3 , x4 , x5 , x6 , x7 , 

x8 , x9 , xlO, xll , xl2 , xl3 , xl4 , Xl5 , 
y0,yl,y2,y3,y4 ,y5,y6,y7, 
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y8,y9,yl0,yll,yl2,Yl3,yl4,yl5, 
tc , acc , sub_add , le_oe , 
clkx, clky , clkp  :  IN  t_wlogic; 
pl6 ,pl7 ,pl8 ,pl9 ,p20,p21,p22 ,p23 , 

p24 ,p25,p26,p27,p28,p29,p30,p31  :  OUT  t_wlogic) ; 
END  inultiplier_16_bit ; 

ARCHITECTURE  full  OF  multiplier_16_bit  IS 
BEGIN 

--  operation 

PROCESS (clkx, clky, clkp) 

VARIABLE  state  :  t_logarray(l  TO  16) ; 

VARIABLE  X  :  integer  :=  0; 

VARIABLE  y  :  integer  :=  0; 

VARIABLE  p  :  integer  :=  0; 

VARIABLE  accr , ter, sub_add_r  :  t_wlogic; 

VARIABLE  unknownx , unknowny  :  boolean  :=  true; 

BEGIN 

--  pickup  X  data 

IF  ( f_rising_edge (clkx) )  THEN 
state (16)  :=  xO; 

state(15)  :=  xl; 
state (14)  :=  x2 ; 

state (13)  :=  x3 ; 

state(12)  :=  x4 ; 
state(ll)  ;=  x5; 
state(lO)  :=  x6; 


state ( 9 ) 

=  x7; 

*;.ate  ( S ) 

=  x8 ; 

.  c.ate  ( 7  ) 

=  x9  ; 

state ( 6 ) 

-  xlO; 

state ( 5 ) 

=  xll; 

state  ( 4 ) 

=  xl2; 

state  ( 3 ) 

=  X 1 3  ; 

state  ( 2 ) 

=  xl4; 

state ( 1 ) 

=  xl5; 

IF  ter  =  FO  THEN 

f_logictoint ( state , unknownx , x) ; 

ELSE 

f_logic_to_int2c (state, unknownx, x) ; 
END  IF; 

END  IF; 

--  pickup  y  data 

IF  f_rising_edge (clky)  THEN 
state(16)  :=  yO; 


( 
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state (15) 

:=  yi; 

state (14) 

:=  y2; 

state (13) 

:=  y3; 

state ( 12) 

:=  y4; 

state (11) 

:=  y5; 

state ( 10) 

:=  y6; 

state (9) 

=  y7; 

state (8) 

=  y8; 

state (7) 

=  y9; 

state (6) 

=  ylO; 

state (5) 

=  yii; 

state ( 4 ) 

=  yl2; 

state (3) 

=  yi3; 

state  (2) 

=  yl4  ; 

state (1) 

=  yl5; 

IF  ter  =  FO  THEN 

f_logictoint (state , unknowny , y ) ; 

ELSE 

f_logic_to_int2c ( state , unknowny , y ) ; 

END  IF; 

END  IF; 

IF  ( f_rising_edge (clkx) 

OR  f_rising_edge (clky) )  THEN 
IF  ((NOT  unknowny) AND (unknownx) 

AND(LE_OE=Fl) AND(accr=FO) )  THEN 

p  :=  y; 

ELSIF  (NOT  unknownx)  AND  (NOT  unknowny)  THEN 
IF  (accr  =  FI)  THEN 

IF  (sub_add_r  =  FI)  THEN 

p  :=  X  *  y  -  p; 

ELSE 

p  :=  p  +  X  *  y; 

END  IF; 

ELSE  —  accr  =  FO 

IF  (LE_OE  =  FO)  THEN 

p  :=  X  *  y; 

ELSE 

p  :=  y; 

END  IF; 

END  IF; 

END  IF; 

IF  ter  =  FO  THEN 

f_inttologic (p, state , ttl ) ; 

ELSE 

f_intl_to_logic2c (p, state , ttl ) ; 

END  IF; 
r.c'"-  acc; 
ter  :=  tc ; 
sub_add_r  :=  sub_add; 

END  IF; 
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—  assign  values  to  output  signals 

IF  f_rising_edge (clkp)  THEN 
IF  (NOT  unknowny)  THEN 


pl6 

<=  state (16) 

pl7 

<=  state (15) 

pl8 

<=  state (14) 

pl9 

<=  state(13) 

p20 

<=  state (12) 

p21 

<=  state (11) 

p22 

<=  state (10) 

p23 

<=  state (9) ; 

p24 

<=  state (8) ; 

p25 

<=  state (7) ; 

p26 

<=  state (6) ; 

p27 

<=  state ( 5) ; 

p28 

<-  state ( 4 ) ; 

p29 

<=  state ( 3 ) ; 

p30 

<=  state (2 ) ; 

p31 

<=  state ( 1) ; 

ELSE 

pl6 

<=  FX; 

pl7 

<=  FX; 

pis 

<=  FX; 

pl9 

<=  FX; 

p20 

<=  FX; 

p21 

<=  FX; 

p22 

<=  FX; 

p23 

<=  FX; 

p2  4 

<=  FX; 

p2  5 

<=  FX; 

p2u 

<-  FX; 

p2  7 

<=  FX; 

p28 

<=  FX; 

p29 

<=  FX; 

p30 

<=  FX; 

p3  1 

<=  FX; 

END  IF; 

END  IF; 

END  PROCESS; 

END  full; 
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APPENDIX  D  -  COSTFUNCTION  TESTBENCH 


— COSTFUNCDFFCLKINV_CHIP3_CHIPl_CONTROL&ASSIGN_PIN_CHIP4 . VHD 

—  Costfunction  test  bench  program 

—  CHIP3  and  CHIPl  are  full  16/12  bit  adder  with  register  and 

—  also  has  feedback  to  adder  input. 

—  They  are  the  programmed  EP1800. 

—  CONTROL  is  an  programmed  EP1800  for  state  machine 

—  and  pc  interface. 

—  NOW  connect  CONTROL  SIGNALS  &  PC_INTERFACE 

—  NOW  contain  sizeprom  and  AM29510_MULTIPLIER/ACC 

library  EP1800LIB,  SHU; 

use  EP1800LIB.EP1800_PACK.all,  SHU. EPROM_PACK. all ; 

USE  work. mingpack. ALL; 

entity  costfuncshu  is 

end  costfuncshu; 

architecture  portion  of  costfuncshu  is 
component  EP1800 

generic  (JEDEC  :  in  string) ; 
port  (pin_14 ,pin_15,pin_16,pin_17, 
pin_19 , pin_20 , pin_21 , pin_22 , 
pin_48 ,pin_49 ,pin_50,pin_51, 

pin_53 ,pin_54 , pin_55 , pin_56 : in  tri_state : = ’ 0 ' ; 
pin_2 , pin_3 , pin_4 , pin_5 , pin_6 , pin_7 , 
pin_8 ,pin_9,pin_10,pin_ll,pin_12,pin_13 , 
pin_23 , pin_24 , pin_25 , pin_26 , pin_27 , pin_28 , 
pin_29 , pin_30 , pin_31 ,pin_32 , pin_33 , pin_34 , 
pin_36 , pin_37 , pin_38 , pin_39 , pin_40 , pin_41 , 
pin_42 , pin_43 , pin_44 , pin_45 , pin_46 , pin_47 , 
pin_57 , pin_58 , pin_59 , pin_60 , pin_61 , pin_62 , 
pin_63 , pin_64 , pin_65 , pin_66 , pin_67 , pin_68 
: inout  tri_state : = ' 0 ' ) ; 
end  component; 

--  configuration  specification 

for  all  :  eplSOO  use  entity  EP1800LIB. epl800 (structural) ; 

component  AM29510 

port  (xO , xl , x2 , x3 , x4 , x5 , x6, x7 , 
x8,x9,xl0,xll,xl2,xl3,xl4,xl5, 
y0,yl,y2,y3,y4,y5,y6,y7, 
y8,y9,yl0,yll,yl2,yl3,yl4,yl5, 
tc , acc , sub_add , le_oe , 
clkx, clky , clkp  :  IN  tri_state;= ' 0 ' ; 
pl6,pl7,pl8,pl9,p20,p21,p22,p23, 
p2  4 , p2  5 , p2  6 , p27 , p2  8 , p29  , 
p30,p31  :  OUT  tri_state:= ' 0 ' ) ; 
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end  component; 

—  configuration  specification 

for  all  ;  AM29510  use  entity  WORK. AM2 9 510 (FULL) ; 

component  SIZEPROM 

port  (x0,xl,x2,x3,x4,x5,x6,x7, 
x8,x9,xl0,xll,xl2,xl3,xl4,xl5  :  IN  tri_state : = ' 0 ' ; 
y0,yl,y2,y3,y4,y5,y6,y7, 

y8/y9,yl0,yll,yl2,yl3,yl4,yl5  :  OUT  tri_state:=' 0 ' ) ; 
end  component; 

—  configuration  specification 

for  all  :  sizeprom  use  entity  WORK. SIZEPROM (FULL) ; 

signal  INVl , CLR_REQ_L1 , LATCHl  :tri_state; 

signal  CLK  :tri_state  :=  'O'; 

signal  PCCLR_L1  :tri_state  :=  '1'; 

signal  X3  00_L, QOUT_L, QOUTE, QIN_L, X304_L, X3  06_L, 

X3 08_L, X3 0A_L, AEN , IOR_L, I0W_L  : tri_state ; 
signal  RAM_MAC, PIPE, CLKX, CLKY, CF_D0NE  : tri_state ; 
signal  ENABLE, ENi_L, ENj_L  :tri_state; 
signal  A:  tri_vector(0  to  9)  :=  "0000000001"; 

signal  S:  tri_vector(0  to  8)  :=  "000000000"; 

signal  T:  tri_vector(0  to  15):=  "0000000000000000"; 

signal  Y:  tri_vector(0  to  15);=  "0000000000000000"; 

signal  Q:  tri_vector(0  to  9)  :=  "0000000000"; 

signal  U:  tri_vector(0  to  11):=  "000000000000"; 

signal  V:  tri_vector(0  to  15):=  "0000000000000000"; 

signal  P:  tri_vector(0  to  15):=  "0000000000000000"; 

signal  M:  tri_vector(0  to  13):=  "00000000000000"; 

signal  INi:  tri_vector(l  to  16) : ="0000000000000000" ; 
signal  INj :  tri_vector(l  to  16) : ="0000000000000000" ; 
signal  Oi;  tri_vector(0  to  15):=  "0000000000000000"; 
signal  Oj :  tri_vector(0  to  15):=  "0000000000000000"; 
signal  gnd:  tri_state  :=  'O'; 
signal  vcc:  tri_state  :=  '1'; 
signal  eval  :  boolean  :=  false; 
begin 

—  use  named  association  interface  list. 

--  associated  lists  are  according  to 

—  chip  map  from  ALTERA  Design  Processor  Utilization  Report. 

CHIP3:EP1800  generic  map ( "chip3 . jed" ) 
port  map 

(pin_2=>gnd,pin_3=>gnd,pin_4=>gnd,pin_5=>Y (1) , 

pin_6=>Y(8) ,pin_7=>Y(12) , pin_8=>Y ( 14 ) , pin_9=>Y ( 15) , 
pin_10=>T ( 1) , pin_14=>CLR_REQ_Ll , pin_15=>INVl , 
pin_16=>LATCHl,pin_17=>T(3) , pin_19=>T (4 ) , 
pin_20=>T(5) ,pin_21=>T(6) , pin_22=>T ( 7 ) , 
pin_23=>gnd, pin_24=>gnd, pin_25=>gnd, 
pin_27=>gnd , pin_28=>gnd, pin_29=>gnd , 
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pin_3  0=>gnd , pin_3 l=>gnd , pin_32=>gnd , 
pin_33=>Y(9) , 

pin_34=>Y (10)  ,pin_36=>gnd,pin_37=>gnd,pin_38=>gnd, 
pin_39=>gnd, pin_40=>Y (2) ,pin_41=>Y (3) , 
pin_42=>Y(4) ,pin_43=>Y(5) ,pin_44=>gnd, 
pin_45=>gnd,pin_46=>T(2) , 
pin_48=>T(8) ,pin_49=>T(9) , 

pin_50=>T(10) ,pin_5l=>T(ll) ,pin_53=>T(12) , 
pin_54=>T(13) ,pin_55=>T(14) , pin_56=>T ( 15) , 
pin_61=>gnd,pin_62=>T(0) , pin_63=>Y ( 0) , 
pin_64=>Y(6) , pin_65=>Y(7) , 
pin_66=>Y(ll) ,pin_67=>Y (13) , pin_68=>gnd) ; 

CHIP1:EP1800  generic  map ( "chipl . jed" ) 
port  map 

(pin_2=>U(0) ,pin_3=>U(l) ,pin_4=>U(4) ,pin_5=>U(6) , 
pin_6=>U(8) ,pin_7=>U(9) , pin_8=>gnd , pin_9=>gnd, 
pin_14=>gnd, pin_15=>gnd, 

pin_16=>gnd,  pin_17=>CLR_REQ_:ol ,  pin_19=>INVl , 
pin_20=>LATCHl,pin_21=>Q(0) , pin_22=>Q ( 1) , 
pin_23=>gnd, pin_24=>gnd, pin_25=>gna, pin_26=>gnd, 
pin_27=>gnd , pin_28=>gnd , pin_29=>gnd , 
pin_30=>gnd, pin_31=>gnd,pin_32=>gnd, 
pin_33=>gnd, 

pin_3  4=>gnd , pin_3  6=>gnd , pin_37=>gnd , pin_38=>gnd , 
pin_3  9=>gnd , pin_4  0=>gnd , pin_4 l=>gnd , 
pin_42=>gnd, pin_43=>gnd, pin_44=>gnd, 
pin_4  5=>gnd , p in_4  6=>gnd , pin_4  7=>gnd , 
pin_48=>Q(2) ,pin_49=>Q(3) , 
pin_50=>Q(4) , pin_51=>Q (5) , pin_53=>Q ( 6 ) , 
pin_54=>Q(7) ,pin_55=>Q(8) , pin_56=>Q ( 9 ) , 
pin_57=>gnd , pin_61=>gnd, 

pin_62=>U(2) ,pin_63=>U(3) , pin_64=>U ( 5 ) , 
pin_65=>U (7 ) , 

pin_66=>U(10) ,pin_67=>U(ll) , pin_68=>gnd) ; 

CONTROL: EP1800  generic  map ( "control . j ed" ) 
port  map 

(pin_2=>gnd, pin_3=>gnd,pin_4=>gnd, pin_5=>gnd, 
pin_6=->gnd ,  pin_7=>gnd ,  pin_8=>gnd ,  pin_9=>gnd, 
pin_10=>S(0) ,pin_ll=>S (1) , pin_12=>S (2 ) , 
pin_13=>S ( 3 ) , 
pin_14=>gnd, pin_15=>A(0)  , 
pin_16=>A ( 1) , 

pin_17=>A(2) ,pin_19=>A(3) , 
pin_20=>A(4) ,pin_21=>A(5) , pin_22=>A (6) , 
pin_23=>S (8) ,pin_24=>X300_L,pin_25=>QOUT_L, 
pin_26=>QOUTE, 

pin_27=>gnd, pin_28=>gnd, pin_29=>gnd, 
pin_30=>gnd,pin_31=>gnd, pin_32=>gnd , 
pin_33=>gnd , 
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pin_34=>QIN_L, pin_36=>gnd, pin_37=>gnd,pin_38=>gnd, 
pin_39=>gnd, pin_40=>gnd, pin_41=>gnd, 
pin_42=>gnd,pin_43=>gnd, pin_44=>X304_L, 
pin_45=>X306_L, pin_46=>X308_L, pin_47=>X30A_L, 
pin_48=>A(7) , pin_49=>A (8) , 
pin_50=>A(9) , 
pin_51=>AEN, pin_53=>CLK, 

pin_54=>IOR_L,pin_55=>IOW_L,pin_56=>PCCLR_Ll, 
pin_57=>S(4) ,pin_58=>S (5) ,pin_59=>S (6) , 
pin_60=>S(7) , pin_61=>gnd, 
pin_62=>gnd , pin_63=>gnd , pin_64=>gnd , 
pin_65=>gnd, 

pin_66=>gnd , pin_67=>gnd , pin_68=>gnd) ; 


PROM_S I Z  E : S I Z  E  PROM 

port  inap(U(0)  ,U(1)  ,U(2)  ,U(3)  ,U(4)  ,U(5)  ,U(6)  ,U(7)  , 
U(8) ,U(9) ,U(10) ,U(11) ,gnd,gnd,gnd,gnd, 

V(0) ,V(1) ,V(2) ,V(3) ,V(4) ,V(5) ,V(6) ,V(7) , 

V  ( 8  )  ,  V  ( 9  )  .  V  ^  1  0)  .  V  ( 1  ^  .  V  ( 12  ) 
V(13),V(14);V{15)); 

MULTIPLY :AM2 9 510 

port  inap(V(0)  ,V(1)  ,V(2)  ,V(3)  ,V(4)  ,V(5)  ,V(6)  ,V(7)  , 
V(8) ,V(9) ,V(10) ,V(11) ,V(12) , 

V(13) ,V(14) ,V{15) , 

Y(0),Y(1),Y(2),Y(3),Y(4),Y(5),Y(6),Y(7), 

Y(8) ,Y(9) ,Y(10) ,Y(11) ,Y(12) , 

Y(13) ,Y(14) ,Y(15) , 

RAM_MAC , RAM_MAC , vcc , vcc , CLKX , CLKY , CLKY , 
P(0),P(1),P(2),P(3),P(4),P(5),P(6),P(7), 

P(8) ,P(9) ,P(10) ,P(11) ,P(12) , 

P(13) ,P(14) ,P(15) ) ; 

gnd  <=  'O'; 
vcc  <=  ' 1 ' ; 

CLK  <-  bit_to_tri (NOT  tri_to_bit (CLK) )  AFTER  300  ns ; 
PCCLR_L1  <=  ' 1 ' , 

'O'  AFTER  10  ns, 

'1'  AFTER  310  ns; 

process (ENABLE , INVl , RAM_MAC) 
begin 

if  ENABLE='l'  then 

if  RAM_MAC='0'  then 
T  <=  "0100000000000000";  —  ACC[i] 
else 

T  <=  "1111000000000000";  —  TABLE[i] 

Q  <=  "1111000000";  —  SIZE[i] 

end  if; 
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else 

if  INVl='l'  then 

T  <=  "1000000000000000";  —  TABLE[j] 
Q  <=  "1000000000";  —  SIZE[j] 

end  if; 
end  if; 
end  process; 


CLR  REQ  LI 

<= 

ENABLE 

<= 

ENi  L 

<= 

ENj  L 

<= 

LATCHl 

<  = 

INVl 

RAM  MAC 

<= 

CLKY 

<  = 

CLKX 

<  = 

PIPE 

<  = 

CF  DONE 

<  = 

S(0)  ; 

S(l)  ; 

bit_to_tri (NOT  tri_to_bit (ENABLE) ) 
ENABLE ; 

S(2)  ; 

S  ( 3 )  ; 

S(4)  ; 

S(5)  ; 

S(6)  ; 

S(7)  ; 

S(7)  ; 


eval  <=  true  AFTER  15  us; 
ASSERT  (S(7)  /=  '1') 

ASSERT  (NOT  eval) 

REPORT  "SIMULATION  DONE"; 


end  portion; 
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APPENDIX  E  -  RESULT  OF  THE  SIMULATION 
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